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The pandemic 
question 


Fornow, the World Health Organization 
isright to emphasize containment of the 
new coronavirus. 


ore than 3,000 recorded deaths and 
90,000 confirmed infections, and the 
numbers are still rising. The coronavirus 
that causes COVID-19 has spread to more 
than70 countries, with morenationsbeing 
affected daily. As new clusters emerge, all eyes are on the 
World Health Organization (WHO). 

Last week, the agency held back from describing the 
outbreak asa pandemic — usually understood tomean the 
spread across multiple regions ofa disease that cannot be 
contained (seep.12). The WHO's decision was based partly 
onthe fact that mostofthe virus’sglobal spread canstill be 
traced to countries that haveexperienced large outbreaks, 
suchas China, Iran, Italy and South Korea. Thereare signs 
~inChina, for example, where the spread of disease seems 
tobeslowing — that the virus could yetbe contained ifthe 
right measures are putin place. 

Anotherargument for notusing ‘pandemic’ is that much 
of the world is already on maximum alert. Countries are 
restricting travel; borders are being sealed; schools and 
public buildings are being shuttered; and gatherings, 
including research conferences, are being called off (see 
p. 13). Moreover, a huge effortis being made to trace and 
track new outbreaks; researchers arecollaboratingacross 
borders to determine and share virus genomesequences; 
vaccine developmentisunder way;and many journals are 
makingall related research and data open access. 

David Heymann, an infectious-disease epidemiologistat 
the London School of Hygiene and Tropical Medicine who 
led the WHO's response to severe acute respiratory syn- 
drome (SARS) in 2003, told Nature that heis not advising 
the WHO to call ita pandemicatthis point ~ partly because 
the virusis not spreadingin thesameway asthe pandemics 
of the twentieth century, which claimed millions of lives. 
Therearealso the economicimplications to consider. Even 
withoutthe virus being described asa pandemic, the values 
of stocks and shares have fallen sharply and some econo- 
mies areat risk of recession. 

Butthe virusis till spreading daily, and more previously 
undetected clusters will probably be found, suchas those 
recently discovered in the United States. Marc Lipsitch, 
an infectious-disease epidemiologist at the Harvard T.H. 
Chan School of Public Health in Boston, Massachusetts, 
told Nature that “under almost any reasonable definition 
of pandemic, there’s now evidence of it happening’. 

Part of the difficulty for the WHO is that the impact of 
a pandemic declaration in previous disease outbreaks is 
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hard to assess, because there are few examples to goon. 
The 2002-03 outbreak of the SARS coronavirus, which 
Killed 774 people (out of atotal of 8,098 infections, spread 
across some two dozen countries), was not described as 
a pandemic by the WHO. Neither was the 2014-16 Ebola 
outbreak, which affected three countries in West Africa, 
and resultedin 28,616 infections and 11,310 deaths. 

In the case of SARS, Heymann says, most transmission 
occurred in clusters of infected health-care workers and 
hospital patients, and in the families of health-care work- 
ers, with occasional transmission in the wider community. 
Asimilar pattern was seen in the early outbreaks of the 
new coronavirus in China, and is now occurringin other 
countries. SARS “was not a pandemicin the sense of pan- 
demicinfluenzaor cholera, where transmission was more 
generalized’, Heymannsays. 

‘The WHO did declare the 2009-10 HIN1 influenza out- 
break a pandemic, partly to trigger the release of funding 
forvaccine production. Atpresent, however, thereisnovac- 
cine against the virus that causes COVID-19. The agency has 
also stopped using the definition of pandemicthatit used 
atthat time. On that occasion, some people criticized the 
agency for over-reacting — initial estimates of deaths were 
about 18,600. But thatnumberlooks to have been anunder- 
count, and revised estimates of fatalities in the first year 
that the virus circulated range from 150,000 to $75,000. 
Therewere 61 millioninfectionsintheUnited States alone 
(L.Simonsen etal. PLoS Med. 10, €1001558;2013). 

Onpreviousoccasions, much of the WHO'sworkinvolved 
persuading reluctant governments to acknowledge the 
severity of an infectious-disease outbreak. Fortunately, 
that has changed with the virus that causes COVID-19. 

If past outbreaks area guide, weare only in the foothills, 
of anew disease that could continue to spread for many 
moremonths. All countries mustputin place containment 
measures. But the p-word should remain on the table. If 
the virus spread accelerates, it may be necessary to use it. 


Awarning from 
the forests of Africa 
and the Amazon 


Carbon analysis suggests faster 
emissions reductions are needed. 


stropical forests grow, they pull carbon dioxide 
out of the atmosphere — one of their many 

services to humanity and the planet. 
Decades of measurements in hundreds of 
plotsin Africa and South America show how 
tropical trees such as Brazil nut (Bertholletia excelsa) and 
kapok (Ceiba pentandra) absorbed as much as 4.4 billion 
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tonnes of carbon dioxide annually in the 1990s and early 
2000s. That’s enough to more than offset the European 
Union’s carbon emissions during the same period. 

This effect is baked into many of the climate models 
that researchers use to project future global-warming 
scenarios. However, a study published in Naturethis week 
suggests that the benefits from this tropical carbon ‘sink’ 
might be fleeting (W. Hubau etal. NatureS79, 80-87;2020). 
And that could mean the international community will 
needto pledge yet faster emissions reductionsif the world 
istolimit global warming tobelow2°C, inline with the 2015 
Paris climate agreement. 

An international team led by geographers from the 
University of Leeds, UK, reports on page 80 that the 
Amazon rainforest has been absorbing less atmos- 
pheric carbon each year since the early 1990s. Forests in 
Africa have also been absorbing less atmospheric carbon 
since around 2015. This is due in large part to rising tree 
mortality. 

Trees are dying, the researchers found, because 
temperatures are risingand droughts increasing, atrend 
that is likely to continue as greenhouse gases buildup. A 
decade from now, Africa’s carbon sink will be 14% lower 
compared with2010-15. The Amazonian carbonsinkison 
courseto disappear completely by 2035. If thathappensit 
will result in more carbon dioxide in the atmosphere, and 
therefore more global warming. 

As we reported in a Feature last week, the Amazon's 
Smillion square kilometres look more precarious than ever 
(see Nature 578, 505-507; 2020). Average temperatures 
in this rainforest, which spans nine countries, have risen 
by1-1.5°C over the past century; there have been three 
severe droughts since 2005 and tree clearing has shrunk 
the forest by 15% since the 1970s. Brazil, once praised for 
its efforts in slowing deforestation, lost 10,000 square 
kilometres last year ~ the largest drop for a decade. A 
ten-year ban on planting sugar cane in the Amazon was 
lifted last November; and abill to regulate oil and mining 
exploration is making its way to the national congress, 
Brazil's parliament. 

In September, independent researchers fromthe region, 
formeda science panel to propose what needs tobe done 
toconserve the Amazon. The panel hasn't yet completed 
its report, butts overarchingmessage cannot be in doubt: 
Brazil and other tropical nations need to halt deforest- 
ation and promote new forests in degraded — and often 
abandoned ~ lands. 

At November's summit ofthe United Nations Framework 
Convention on Climate Change in Glasgow, UK, participat- 
ing countries will be expected to redouble their pledgesto 
meet the Paris climate agreement's goals. If tropical car- 
bonsinks can no longer be fully relied upon to help reach 
that target, itmeans more ambitious decarbonization will 
beneeded. 

Acthe same time, the lesson for governments around 
theworldis clear enough: tropical forests are working for 
humanity ~and for countless other creatures. To protect 
them, humanity must halt both deforestation and global 
warming. 
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China changes tack 


Anew researcher-evaluation system must 
not reduce international collaborations. 


hina’s researchers and research institutions 
are evaluated, ranked and funded according, 
to their record of publishing in journals cov- 
ered by the Science Citation Index (SC1), an 
international database of articles and citation 
records for around 9,000 journals. 

The number of articles in these journals by authors at 
Chinese institutions increased nearly fourfold between 
2009 and 2019. In that time, China's researchers have 
increased international collaborations, whichhave helped 
them secure international publications. But there have 
been concerns that widespread use of publication metrics 
incentivizes lower-quality work, as does the fact thatsome 
institutions pay bonuses to those publishingin journals. 

Butthat might be aboutto change. Lastmonth, the Chi- 
nese government ordered institutions to stop promoting 
or recruiting solely on the basis of number of papers or 
citations, and to end publishing bonuses (see page18). 

Research will stillbe evaluated, butinstitutions haveuntil 
the end of July to propose new indicators. An alternative 
system will need to measure research quality and inno- 
vation, and whether something represents a significant 
advance or helps to solve an important societal problem. 
Evaluators will need to rely more on peer judgement, 
and, crucially, researchers must consider publishing in 
non-SCl-indexedjournals. 

The change is significant, and intended to meet two 
important government objectives. First, itis designed to 
help root out plagiarism, self-citation and colleagues cit- 
ing each others’ work toboost their citations. Second, itis 
aimed at boosting China's own research-publishingindus- 
try, which the government has wanted to do ~ but whichis 
difficultif the best research is published internationally. 

To enable more domestic research publishing, 
the government last year allocated one billion yuan 
(US$143 million) over 5 years to improve the standards 
of some 280 Chinese journals, most of which publish in 
English. These journals have been ranked, with each of the 
top 22 receiving between one million and five millionyuan 
annually to help themattracta higher standard of submis- 
sions, not only from China, but from around the world. 

When this policy was announced, it wasn’t known how 
the publishers would use their subsidy or how the gov- 
ernment would measure success. The answers to both 
questions are now clearer. 

China's governmentisurgingits researchers to play their 
part by publishing in home-grown journals. Thatisimpor- 
tant, notleastbecause it will makesciencemore accessible 
in China. Butin setting up the new evaluation system, the 
government must be careful to protect the collaborations 
~and the relationships ~ that came with the old. 
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Apersonal take on science and society 


World view 


By Nnaemeka 
Ndodo 


Extended US travel ban 


harms global science 


From preparing for pandemics to boos! 
crop yields, Nigerian scientists who work 
and train abroad are making the world safer. 
Now that’s under threat. 


nthe first weeks of the COVID-19 outbreak, worked 
with others to set up capabilities to detect the disease 
at our National Reference Laboratory in Abuja, and at 
the Lagos University Teaching Hospital. These cities 
are the main points of entry into Nigeria — the most 
populous country in Africa. We detected the country’sfirst 
case of the coronavirus on 27 February, in Lagos. That Nige- 
rian researchers could act withouthaving to wait for exter- 
nal partners has made my country, and the world, safer. 

But | fear that our capacity to act in this way is under 
threat. Much of the expertise that my colleagues and | 
drew on came from our experiences trainingand working 
abroad. I've spent time at Graz in Austria; Yale University 
in New Haven, Connecticut; and the Harvard Stem Cell 
Institute in Cambridge, Massachusetts. Chikwelhekweazu, 
who leads the Nigeria Centre for Disease Controland whose 
support was crucial to setting up the country’s detection 
facilities, also trained abroad (in his case, in Germany, 
where an anti-immigrantattack took place last month). 

This kind of expertise goes beyond global health: a 
Nigerian geneticist who trained at Cornell University in 
Ithaca, New York, is using genotyping toselectively breed 
our indigenous cowsso thatthey canresist disease and pro- 
duce high-quality milk. A Nigeria-based projectto improve 
cassava could save the crop in this country, and in therest 
of the world. Nigerian software engineers are developing 
medical-records databases and e-commerce applications 
that will strengthen my country’s health and economy. 
In short, many scientists from Nigeria who trained in the 
United States, the United Kingdom and elsewhere are 
returning home, inspired to improve people's lives. We 
should encourage more of this. 

Yet our ability to travel is contracting. On 21 February, 
people from Nigeria, and also from Myanmar, Eritrea, 
Kyrgyzstan, Sudanand Tanzania, joined those from Libya, 
Iran, Somalia, Syria and Yemen — as well as from North 
Korea and Venezuela ~ in facing restricted entry to the 
United States. The extended travel ban doesnot officially 
affect students or highly specialized workers. Practically, 
however, it means that scientists will find immigration 
more difficult and that, if they do get visas, their close 
family members will not be able to join them. 

When learntthenews, 1wasstunned. In2018, according 
tothe New American Economy research group, morethan 
645,000 immigrants arrived in the United States from the 


6countries recently added to the travel ban; more than half 
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of these immigrants were Nigerian. | do not see how my 
country canbe considered a threat tothe United States. The 
twocountries have close diplomatic, economicand social 
ties. The United Nations estimated that some 1.24 million 
people made up the Nigerian diaspora in 2017, with the 
United States attracting the vast majority of emigrants, 
and numbers were growing. 

These people have boosted prosperity in the countries 
that received them. Nigerians form one of the most edu- 
cated immigrant communitiesin the United Statesand paid 
more than US4 billion in federal, state and local taxes in 
2018. They areanasset, nota liability. Whatever the ration- 
ale forthe baffling expansions to the ban, the consensusis 
that itwill disconnect Nigerians in the United States from 
their families and make fora more hostile environment. 

Evenwithoutsuch moves, travellingcanbe difficult. Just, 
the sight of my country’s green passport can bring exces- 
sive scrutiny. On my very first international trip, airport 
officials ina major European city searched my luggage. 
When they found an anatomy book, they decided I must 
beconcealing drugsin my body. Sothey forced metosign 
adocument at gunpoint, tookme from the airport without 
acoat (I had never experienced such cold) and performed 
X-rays (another first) that found no contraband. When! 
demanded an apology, they said I should blame my mal- 
treatment on compatriots selling narcotics. lwas ready to 
return to Nigeria immediately to avoid future illtreatment.1 
am glad now thata colleague convinced me otherwise, and! 
continued my journey totraininmolecular bioengineering. 

Already, many young Nigerian scientists wishing to 
study in the United States are repeatedly denied the 
sorts of training opportunity that had, their visas being 
turned down for one reason after another. This is so 
short-sighted: if a new human or crop pathogen arises 
here, we need top molecular biologists to detect it. For 
that, Nigerian scientists need to be in contact with their 
colleagues in other countries. The quality of training we 
received abroad gave us the confidence and expertise 
to work with leading scientists on COVID-19, to under- 
stand its genetic-sequence data and to use World Health 
Organization-approved target sequences (DNA primers 
and probes for amplifying any viral material) to set up 
early-detection capacity in our country. 

Jam not saying that the ability to travel freely will on 
its own result in world-class institutions in Nigeria. My 
country’s government needs to do more to build science 
here, for instance, by providing more (and more reliable) 
funding, and establishing a dependable power grid. 

The visa ban, no matter its justification, will worsen an 
already bad situation. Itcan only haveadamagingeffecton 
the Nigerian economy and on Nigerian scientists. And that 
~inan increasingly connected world — will hurtall of us. 
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The world this week 


Newsin brief 


CORONAVIRUS ENTERS 


DANGEROUS NEW PHASE 


Thenew coronavirus has spread 
tomore than 70 nations and 
the total number of infections 
worldwide had passed 90,000 
as Nature went to press (see 
‘Rapid spread’). 

Researchers have warned that 
the surge in outbreaks outside 
China, where the virus emerged 
and most cases have occurred, 
means that the coronavirusis 
becoming unstoppable. 

The World Health 
Organization has resisted 
describing the situation asa 
pandemic. Director-general 
Tedros Adhanom Ghebreyesus 
said on 2 March that there was 
still achance of containing 
the virus. Mike Ryan, director 
of the WHO's emergencies 
programme, said that using the 
word pandemic would mean 
that efforts to contain and slow 
the spread of the virus have 
failed, which has proved untrue 
inChina, Singaporeand other 
regions. 

But other scientists say 
the surge in international 
cases marksa tipping point. 
“think the epidemiological 
conditions for apandemic 
are met,’ says Marc Lipsitch, 


aninfectious-disease 
epidemiologistat the Harvard 
T.H. Chan School of Public 
Health inBoston, Massachusetts. 

He and others say that 
although containment measures 
seem to have kept outbreaks 
from escalating outside China 
for morethanamonth, such 
procedures mightsoon become 
unfeasible ona broader scale. 
Those efforts have involved 
quickly identifying infected 
people and their close contacts, 
and isolating them to prevent 
further transmission. 

“We'vegot to think more 
carefully about what measures 
might be sustainable in terms 
of reducing transmission 
withoutshutting down cities 
completely and stopping 
people from moving,” says Ben 
Cowling, an infectious-disease 
epidemiologistat the University 
of Hong Kong. 

Theefforts include ’social 
distancing’, which reduces the 
average chances that uninfected 
people will encounter an 
infected person. But some 
epidemiologists say too little 
isknown about the outbreak to 
deploy this effectively. 


RAPID SPREAD 


‘The new coronavirus has infected more than 90,000 people glabally 
and spread to more than 70 countries. The vast majarity of cases — 
‘some 80,000 — are in China, where the pathogen emerged. 
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BETELGEUSE STARTS 
TOBRIGHTEN AGAIN 


After amysterious four-month 
fading streak, the star known as, 
Betelgeuse could be on its way 
to regainingits shine. 

Easily recognizableas 
the right’shoulder’in the 
constellation Orion, Betelgeuse 
isnormally one of the ten 
brightest starsin thenight sky. 
Butitbegan getting dimmer 
in October last year, and by 
mid-February ithad lost more 
than two-thirds of its brilliance 
~adifference noticeable to the 
naked eye. The star now appears 
tobe recovering, and has 
brightened by around 10% from 
its dimmest point. 

Astronomers have proposed 
several explanations for the 
dimming. Oneis the emergence 
ofalarge, unusually cool 
convection cell - a blob of 
cooling plasma on its surface. 
Another is that the star could be 
moving behind a dust cloud. 

Some have speculated that 
the star's erratic swingsin 
brightness mean it might be 
approaching the end of its 
life. Astrophysicists predict 
Betelgeuse will end ina 
supernova sometime in the 
next 100,000 years. But what 
happens rightbefore a star 
explodes in this way is unknown, 
and the exact timing of the fiery 
end is impossibleto predict. 
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Next stop, the 
twilight zone 


Sign up to get essential science 
news, opinion and analysii 
delivered to your inbox daily. 

Visit go.nature.com/newsletter 


Itis home to a majority of the marine fish biomass and 
helps to remove an estimated 4 billion tonnes of carbon 
dioxide from the atmosphere each year. Now, scientists 
are gearing up to dive into the twilight zone, the largely 
unexplored ocean layer 200-1,000 metres deep that 
some worry is threatened by a changing climate and 
increased pressure from fishing. 

As part ofa US$25-million mission, NASA will travel 
to the North Atlantic in April to study the movement of 
carbon between the atmosphere and the deep ocean. 
Others will join the expedition thanks to a collaborative 
venture unveiled at the American Geophysical Union's 
ocean-science meeting in San Diego, California, last 
week. 

“Thisis literally the biggest investment ever made in 
the twilight zone,” says Dave Siegel, an oceanographer 
at the University of California, Santa Barbara. He is 
heading the NASA mission, dubbed Export Processes 
inthe Ocean from Remote Sensing, or EXPORTS. The 
addition of a network of collaborators promises to 
bolster data-sharing and coordination with other 
research efforts around the world. “If we can federate, 
we can help each other,” he says. 


CORONAVIRUS 
NIXES MASSIVE 
PHYSICS MEETING 


One of the world’s biggest 
scientific conferences ~ the 
March Meeting of the American 
Physical Society (APS) — 

was cancelled just before it 
was scheduled to begin in 
Denver, Colorado, for fear of 
contributing to the spread of 
coronavirus. 

Hundreds of registered 
participants had already 
arrived in Denver when they 
received an e-mail fromthe 
APSon 29 February. Theweek- 
long meeting wasset to begin 
on2March, with more than 
1,000attendees, APS leaders 
said thata major factor was the 
decision by the US Centers for 
Disease Control and Prevention 
toissue the highest level of 
travel warning for Italy and 
South Korea, where coronavirus 
outbreaks are growing rapidly. 
‘The warningincludesa 
recommendation to avoid all 
non-essential trips. 

Still, physicists are finding 
ways to get the word out about 
their research, despite the 
cancellation. Some will record 
their talks and upload themto 
virtualmarchmeeting.com, a 
website quickly setup for this 
purpose. The APS itself says 
itwill provide a platform for 
sharing presentations, andis 
asking registrants to submit 
links to their talks. Some 
scientists in Denver are holding 
informal get-togethers for their 
disciplines, a practice called 
unconferencing. 

“ttwas clear that nothing 
formal was possible, like 
recreating the whole meeting 
virtually”, so speakers were 
invited to post their own links to 
an onlinespreadsheetinstead, 
says Karen Daniels, a physicistat 
North Carolina State University 
in Raleigh who is leading one 
disciplinary effort. 


es. 


WHY RATS ARE 
NEW YORKERS TOO 


Ananalysis of thegenomes of 
New York City ratsis offering 
clues to the rodents’ ability 
to thrive in urban jungles, 
Researchers identified dozens 
of areas of the rat genome that 
were specific to animals in New 
York — including several linked 
to diet, behaviour and mobility. 
“{can’thelp but be amazed by 
the ways that rats have adapted 
to urban environments,” says 
Arbel Harpak, a population 
geneticist at the city’s Columbia 
University, who co-led the 
study (A.Harpak etal. Preprint 
at bioRxiv http://doi.org/ 
dnxd; 2020). 
Harpak’s team sequenced the 
full genomes of 29 New York City 
rats, and compared them with 
those of rats from northeast 
China, the presumed ancestral 
home of brown rats (Ractus 
norvegicus). The researchers 
looked for genomeregions 
containing variations that were 
likely to be so beneficial to New 
York City rats that they quickly 
became common. Thescan 
produced dozens of such genes, 
including some associated with 
diet, behaviour and mobility 
= perhaps reflecting the 
challenges, and delights, of life 
inthe Big Apple. 
Thescientists can’t yet say 
how these genomic hallmarks 
influence the animals’ biology. 
But future tests in transgenic lab 
rats could help to explain. 
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The world this week 


Newsin focus 


The Marine Biological Laboratory in Woods Hole, Massachusetts. 


BIOLOGIST EXITS PRESTIGIOUS 
POST YEARS AFTER VIOLATING 
SEXUAL-HARASSMENT POLICY 


The incident raises important questions about howinstitutions handle accusations of 
harassment that occurred at different universities — particularly in the #MeToo era. 


By Amy Maxmen 


or the past few years, graduatestudents 
applying for a prestigious summer 
course at the Marine Biological Lab- 
oratory (MBL) in the harbourside 
town of Woods Hole, Massachusetts, 

have been quietly warned about the course's 
co-director ~ Richard Schneider. In 2013, an 
investigation athisinstitution, the University 
of California, San Francisco (UCSF), found that 
he had violated its sexual harassment policy. 
Although media reports in 2017 had pub- 
lished some details of Schneider's case, 


the situation was discussed only in hushed 
tones among researchers involved with the 
MBL embryology course. That changed in 
mid-January, when a young developmental 
biologist, Carolyn Dundes, tweeted: “Was 
super stoked to apply to an MBL course this 
summer but an ally informed me that the 
course co-director violated UCSF policy on 
sexual harassment.” 

Two days later, Schneider resigned. On 
24 January, he was replaced as co-director. 

This comes asa relief for some scientists 
and alumni affiliated with the course who have 
been uncomfortable ever since Schneider's 
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violation was made public in 2017 — a few 
months before the first summer course that 
he co-directed. (Becausedirectorships last for 
five years, it was expected he would finish in 
2021.) Inthe past few years, scientists who have 
participated in the programme have quietly 
grappled with what to do. Some worried that 
Schneider might repeat the offence; others felt 
guilty byassociation;and some simply wished 
ithad beenaddressedhead on. Dundes found 
ittroubling enough toabandon plans toapply. 

“’shorrible— every summer, the students 
find out,’ says one instructor, who asked for 
anonymity to protect against retribution, 
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Several other scientists who have taught or 
taken the course spoke to Nature on condition 
of anonymity for the same reason that they 
didn’t speak up earlier: the MBL embryology 
course is taught by high-ranking biologists 
who wield significant influence in their fields. 
Early-career researchers say that speaking up 
could costthemcollaborations, grants or jobs. 

The head of the MBL, developmental 
biologist Nipam Patel, declined to comment 
onwhether he had received complaints about 
Schneider from students or visitingscientists 
in previous years. However, he says the MBL 
has policies barring harassment of all types 
atthe institute, 

The public discussion about Schneider and 
his sudden departure reflect a growing con- 
cern about sexual harassment in academi: 
And they raiseimportant questions abouthow 
institutions handle accusations of harassment 
that occurred at different universities. Many, 
including the MBL, lack policies about vetting 
candidates for previous misconduct, which 
canbe especially difficult given thatattitudes 
and discussionaboutthesubjecthave changed 
inthe past2.5 years. 

“Academic institutions are struggling 
with how to deal with allegations that pre 
dated the #MeToo movement,’ says Debra 
Katz, a civil-rights lawyer specializing in 
sexual-assault and harassment cases at the 
firm Katz, Marshall & Banks in Washington 
DC. Thehashtag #MeToo went viral in October 
2017. And now, atthe MBLand elsewhere, Katz 
says, “Students are responding to the cul- 
tural shift, and saying, ‘No, we don’t want to 
be in close proximity with someone who has 


harassed other students in academia. 


Theinvestigation 

The MBL discussion concerns a covert 
sexual relationship between Schneider and 
agraduate student, which began weeks after 
she joined his lab in 2008, at the age of 22. 
The details of their sexual relationship are 
described in a report by a committee that 
investigated a complaint the student filed 
to UCSF in 2012. UCSF provided a redacted 
version of the reportto Nature. 

The student, who requested anonymity to 
protect her from stigmatization, told Nature 
that the physical relationship started when 
Schneider invited her toa party atUCSF. They 
drankalcohol, then went toastrip club, where 
the student says their first sexual encounter 
happened ~ and this is substantiated in the 
investigation report. “Atthe time, Ifeltlike he 
valued me scientifically,” she recalls. "I feltlike 
this is whata fun scientist would do." 

For thenext two anda half years, Schneider 
and thestudent had asexual relationship that 
theykept private. The student saysshe experi- 
enced mountinganxiety over the relationship. 
“I didn’trealize how dependent | was on his 
approval — what conferences | could go to, 
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what projects! couldworkon, my references,” 
she says, “He was my thesis adviser, I couldn't 
graduate without his approval.’ 

In 2012, she asked for formal mediation 
because she could no longer work in 
Schneider's presence. She says that Schneider 
told her that if others found out about their 
relationship, it would ruin both of their re 
utations. Looking back on their relationship, 
she says, “Idon’t think it could be called con- 
sensual with that kind of power imbalance.” 

The investigation, which interviewed 13 
witnesses, found that “although the rela- 
tionship may have begun as consensual, the 
evidence supports a finding that the Com- 
plainant, at some point, felt coerced to con- 
tinue the relationship and reasonably believed 
that she had no choice but to continue the 


“Institutions are struggling 
with how todeal with 
allegations that pre-dated 
the #MeToo movement.” 


relationship lestit damage her career”. 

Schneider did notreply tomultiple requests 
for comment. But in the investigation report 
from UCSF, Schneider “maintains that their 
relationship was welcome and consensual 
from beginning to end”. 

The report concludes that Schneider's 
“actions and behavior are in violation of the 
UC Policy on Sexual Harassment’. Two years 
later, in February 2015, UCSF chancellor Sam 
Hawgood informed Schneider throughaletter 
that he would be disciplined with a demotion 
from professor to associate professor. 

The next year, Schneider won a ‘Mentor of 
the Year’ award fromUCSF. (The university says 
he was selected for the prize by students.) He 
continues to supervise researchersin his UCSF 
lab, Meanwhile, the student left academia after 
earning her PhD. “Iwent into a deep depres- 
sion,” she says to Nature. “I had panic attacks 
and crippling nightmares for years.” 


Intense environment 
Schneider's career continued to advance. In 
December 2016, the MBL announced that 
he would co-direct its embryology summer 
course. During these programmes, around 
20trainees, mainly intheir early twenties, live 
alongside the course directors for sixweeksin 
Woods Hole. Patel says that Schneider's vio- 
lation wasn’t known when he was appointed. 
Butseveral people affiliated with the course 
said they discovered the violation soon after- 
wards. That's because in early 2017, inresponse 
to public-records requests, the University of 
California gave media outlets more than 100 
redacted records on harassment cases across 
its campuses from 2013 to 2016. The Mercury 
‘News, a paper based in the San Francisco Bay 


© 2020 Springer Nature Limited. All rights reserved. 


area, reported on Schneider's case. In March, 
two databases on sexual harassment in aca- 
demia posted his violation online. 

By 2019, many graduate studentsand post- 
doctoral researchersin the course were aware 
of Schneider's past because their colleagues 
had sentthem links to thedatabases and media 
articles. “Iwas frankly very frustrated because 
the embryology course is known to be amaz- 
ing, sol went but was on guard,” saysagrad- 
uate student who took the course last year, 
andwho asked to remainanonymousto avoid 
retribution. The student adds, “Sometimes 
Iwould imagine the person who almost left 
grad school because of [Schneider's] actions, 
and wonder what that person would think.” 

Atleast one trainee wasn't bothered. “Rich 
[Schneider] paid his debttosociety, andthere 
arealotofmale scientists who havenever been 
caught,’ the researcher says on condition of 
anonymity. 

On 14 July 2019, the last day of that year's 
course, Schneider brought up his violation 
during an ethics lesson, and apologized if 
it had made the students uncomfortable, 
according to a few students present. “I don’t 
thinkanyonecommented,’ oneofthem recalls. 

But the situation changed quickly after 
Dundes's tweet on 14 January. Within42hours, 
more than 14,000 people had seen thetweet, 
and 824 had clicked on alink that Dundes had 
posted toanaccount of Schneider's violation 
in UCSF's student newspaper, Synapse. Mark 
Peifer, a cell biologistat the University of North 
Carolina, Chapel Hill, replied witha linktoan 
entry on Schneider in one of the databases of 
sexual misconduct. “This is really disturbing 
~ @MBLScience--whatdo yousay about this,” 
he wrote. 

Patel says the MBL has been developing a 
plan for howto vet investigators who violated 
codesof misconductelsewhere. “Frankly, most 
institutions are not going to tell us this infor- 
mation,” Patel says, “so thatis our challenge.” 

But it's not all that hard, counters Julie 
Libarkin, a geologist at Michigan State Uni- 
versity in East Lansing, who created one 
of the online databases of substantiated 
sexual-harassment claims in 2016. Schnei- 
der’s case and more than 1,000 others are in 
it. Libarkin acknowledges that her databaseis 
incomplete because it includes only records 
that have been made public ~ not those that 
were handled confidentially by institutions. 

“A good step would be to require all job 
candidates toaffirm that therehasnever been 
a formal or informal finding of misconduct 
against them,’ she says. “In order to havea 
sustainable academic system, we need to 
put people before everything else,” she adds. 
“These are deep and troubling conversations 
to have, buttheyareso important.” 


‘Amy Maxmen, a senior reporter at Nature, 
attended the MBL course in 2003, 


Uranus (left) and Neptune (right), imaged by Voyager 2, the only probe to have visited them. 


RARE CHANCE TO 


REACH ICEGIANTS 
EXCITES SCIENTISTS 


Aplanetary alignment provides 


a favourable window 


for visiting Uranus and Neptune — but time is tight. 


By Elizabeth Gibney 


omentum is building among 
planetary scientists tosend amajor 
mission to Uranus or Neptune — the 
most distant and least explored 
planets in the Solar System. Huge 
gaps remain in scientists’ knowledge of the 
blueish planets, knownastheice giants, which 
have been visited only once bya space probe. 
Butthe pressureison to organize amission in 
the next decade, because scientists want to 
take advantage of an approaching planetary 
alignment that would cut travel time. 
Interest in the ice giants has grown 
exponentially, says Amy Simon, a planet- 
ary scientist at NASA's Goddard Space 
Flight Center in Greenbelt, Maryland, who 
co-organized a meeting at the Royal Society 
in London in January, dedicated to exploring 
ideas for such a mission. NASA's Voyager 2 is 
the only spacecraftto have visited Uranus and 
Neptune, in brief fly-bys in the 1980s. The ice 
giants therefore represent fresh territory for 
awide range of researchers — including the 
study of planetary rings, atmospheres, moons 
andoceans, says Simon. 


Jovian boost 

The celestial alignment, between Neptune, 
Uranus and Jupiter, next occurs in the early 
2030s, andwouldallowaspacecrafttoslingshot 
around Jupiter on its way tothe planets. This 
would reduce the travel time, and allow the craft 


toarrivewithin the lifetimes ofitsinstruments 
and power systems — usually about 15 years. 
Itwould also cut fuel mass, enabling the craft 
to carry a full suite of scientific instruments 
(see ‘Journey to theice giants’). To take advan- 
tage of the alignment, a mission to Neptune 
would need to launch by around 2031 and 
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JOURNEY TO 
THEICE GIANTS 
Scientists wantto | 
send a major mission 

to Uranus or Neptune 
during a beneficial | 
planetary alignment 

in the 2030s, Using 
existing technology, 
a spacecraft could 
slingshot around 
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planets 
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one to Uranus by the mid-2030s. 

The window is “the right time to launch’, 
Mark Hofstadter, a planetary scientist at 
the Jet Propulsion Laboratory in Pasadena, 
California, said at the London meeting. “We 
don’t want to miss this one.” But the timing is 
tight. NASAis the most likely space agency to 
lead the kind of multibillion-dollar ‘flagship’ 
mission that scientists want. These typically 
take seven to ten years to prepare, and any 
green light from NASA would depend on the 
mission being prioritized in the agency's Plan- 
etary ScienceDecadal Survey, which reportsin 
2022. A mission to Neptune or Uranus would 
also face competition from proposals to return 
a sample from Mars or explore Venus. 

But whereas Mars and Venus scientists are 
building on decades of exploration, “Uranus 
and Neptune are genuinely out on their own, 
as we haven't completed the very first phase 
of their exploration yet’, says Leigh Fletcher, 
a planetary scientist at the University of 
Leicester, UK, who co-organized the meeting. 

Fletcher says thata mission to either planet 
should include goinginto orbit around it and 
sendinga probe into its atmosphere or to one 
ofits moons, asthe Cassini-Huygens mission 
didat Saturn, 


Blue mysteries 
Scientists think of the two planets as twins 
because of their similar sizes and masses. But 
nooneknowshow similar they are, their com- 
position or how they formed, Ravit Helled, a 
planetary scientistat the University of Zurich, 
Switzerland, told the meeting. Models strug 
gletoexplainthe planets’ structures, and why 
more distant Neptune seems to be warmer 
than Uranus. It’s assumed that they are made 
of forms of water, or maybe ammoniaice, said 
Helled. “Butactually we don'treally know that.” 
‘A major mission to the ice giants would 
also benefit exoplanet studies, said Hannah 
Wakeford, an exoplanet scientist at the Uni- 
versity of Bristol, UK. About 40% of known 
exoplanets are ice-giant-sized; understand- 
ing what these planets’ sizes and atmos- 
pheres reveal about their formation relies on 
understanding those in our own SolarSystem. 
Delegates at the meeting agreed that they 
would be happy to visit either planet, because 
both would yield rich results. Studies show 
that it would be feasible to send probes in 
a single mission to both planets, but this 
would be prohibitively expensive. Neptune 
is appealing because its moon Triton seems 
to be geologically active and might host a 
subsurface ocean, potentially of liquid water. 
But Uranus — which has a magnetic field 
thatis tilted relative to its rotation axis ~ has 
more “odd” features than Neptune does, 
which challenge existing scientific models, 
said Hofstadter. The later launch window for 
Uranusalso makes the planeta more realistic 
target, says Fletcher. 
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News infocus 


CHINA BANS CASH 


REWARDS FOR 
PUBLISHING 


Newpolicy tackles perverse incentives that might 
encourage questionable research practices. 


By Smriti Mallapaty 


hinese institutions have been told 
to stop paying researchers bonuses 
for publishing in journals, as part of 
‘a new national policy to cut perverse 
incentives that encourage scientists 
to publish lots of papers rather than focus on 
high-impact work. 

In an order released last week, China’s 
science and education ministries also say 
that institutions must not promote or recruit 
researchers solely on thebasis of the number 
of papers they publish, or their citations. 
Researchers are welcoming the policy, 
but say that it could reduce the country’s 
competitiveness in science. 

In China, one of the main indicators used 
toevaluate researchers, allocate funding and 
rank institutions is metrics collected by the 
Science Citation Index (SCI), a database of 
articles and citation records for more than 
9,000 journals. Since 2009, the number of 
articles in these journals written by authors 
from Chinese institutions increased from 
some 120,000a yearto450,000in2019. Some 
institutions even pay researchers bonuses for 
publishing in them. 

These practices have incentivized 
researchers to publish lots of papers at the 
expense of quality, says Jin Xuan, a chemical 
engineer at Loughborough University, UK. 
Evidence suggests that the focus on metrics 
has also driven a rise in inappropriate 
practices, such as researchers submitting 
plagiarized or fraudulent papers, or inappro 
priately citing their own oracolleague’s work 
toboostcitations(L. Tang eral. j.Assoc. Inf. Sci. 
Tech. 66,1923-1932; 2015). 

The goal of the new policy is not to 
discourage Chinese researchers from pub: 
lishing papers in SCHlisted journals, but to 
stop inappropriate publishing and citation 
practices, says TanglLi, aresearcher of science 
and technology policy at Fudan University in 
Shanghai, China. 

Xuan adds that the policy aligns well with 
global declarations, suchas the San Francisco 
Declaration on Research Assessment, that aim 
tomove away from an over-reliance on these 
types of metric in research appraisals and to 
limitperverseincentivesthat drive researchers 
to engagein questionable research practices. 
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Aspartofthenew policy, researcher assess- 
ments will now need to use indicators of the 
quality of research, such as how innovative 
the work is, and whether it represents sig- 
nificant scientific advance or contributes to 
solving important societal problems. These 
evaluations should also rely more heavily on 
theprofessional opinions of expert peers, and 
consider research in journals published in 
China, many of which are not listed in the SCI. 

But Futao Huang, who studies higher-ed- 
ucation policy at Hiroshima University, 
Japan, says it is not clear what exactly the 
new evaluation system will look like, because 


the ministry's notices lack specific, practical 
recommendations. 

Huang thinks the new measures could result 
inadropinthe number of low-quality or fraud- 
ulent papers, but mightalsotriggera declinein 
China's total publications inindexed journals 
as researchers feel less pressure to publish to 
gain degrees, promotions or funding. 

And fewer Chinese papers in indexed 
journals could affect the country’s research 
competitiveness, says Huang. International 
researchers might be less inclined to 
collaborate with Chineseacademics withouta 
publication record inthesejournals, and fewer 
papers could push Chinese universities lower 
downin international rankings, he says. 

Xuan saysthe focus onassessingresearchers 
on the basis of their work in Chinese journals 
is controversial because allot of them publish 
in Mandarin, and the journalsare unknown to 
scientists outside China. 

Other scientistshave raised concerns about 
the new assessments relying too heavily on 
peer reviews, which are subjective and could 
create conflicts of interest or place too much 
emphasis on personal relationships. 


MYSTERY DEEPENS 
OVER ANIMAL SOURCE 
OF CORONAVIRUS 


Pangolins are a prime suspect, but a slew of genetic 
analyses has yet to find conclusive proof. 


By David Cyranoski 


cientists are racing to identify the 
source of the coronavirus causing 
havocaround the world. Last month, 
Chinese researchers suggested, on 
the basis of genetic analyses, that the 
scaly, ant-eating pangolin was the prime sus- 
pect. But scientists have now examined those 
data— alongwith threesimilar genome studies 
~and say that although the mammal is still a 
contender, the mystery is far from solved. 
Health officials wantto pin down the virus's 
sourceso they can prevent new outbreaks. Sci- 
entists assume that the pathogen jumped to 
people from an animal, as other coronaviruses 
have; for example, the virus that causes severe 
acute respiratory syndrome (SARS) is thought 
tohave jumped tohumans fromcivetsin2002. 
Dozens of people infected early in thecurrent 
outbreak worked ata live-animal marketin the 
Chinese city of Wuhan, but tests of coronavi- 
rus samples found at the market have yet to 
identifya source. 
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Three separate Chinese teamsare trying to 
trace the origin of the coronavirus. Research- 
ersat the South China Agricultural University 
in Guangzhou suggested pangolins asthe ani- 
mal sourceata press conferenceon7 February. 
Pangolins are sought-after in China for their 
meatand scales. Although the animals an’tbe 
sold in China owing toa worldwide ban, they 
are still smuggled in fromelsewherein Asiaand 
Africa. The researchers said they had founda 
coronavirusin smuggled pangolins that wasa 
99% genetic match to the human virus. 

But the result did not pertain to the entire 
genome. In fact, it related to a specific site 
knownas the receptor-binding domain (RBD), 
say the study's authors, who posted their anal- 
ysis'onthe preprintserver bioRxivon20 Feb- 
tuary. The press-conference report was the 
result of an “embarrassing miscommunication 
between the bioinformatics group and the 
lab group of the study’, explains Xiao Lihua, 
a parasitologist at the South China Agricul- 
tural University and a co-author of the paper. 
Awhole-genome comparison found that the 


Pangolins are often smuggled into China, where there is demand for their meat and scales. 


pangolin and human viruses share 90.3% of 
their DNA. 

The RBD isa crucial part of coronaviruses 
thatallows them tolatch onto and enter acell. 
Even a 99% similarity between two viruses’ 
RBDs is not necessarily enough to link them, 
says Linfa Wang, avirologist at Duke-National 
University of Singapore Medical School. 


Three similar comparison studies were also 
posted on bioRxiv last month. One, posted 
on 18 February, found? that coronaviruses in 
frozen cell samples from illegally trafficked 
pangolins shared between 85.5% and 92.4% 
of their DNA with the virus found in humans. 

‘Two other papers** published on 20 Feb- 
ruary studied coronaviruses from smuggled 


pangolins. These showed 90.23% and 91.02% 
similarity, respectively, tothe new coronavirus. 

Higher genetic similarity is needed before 
the host can be definitively identified, says 
Arinjay Banerjee, who studies coronaviruses 
at McMaster University in Hamilton, Canada. 
He notes that the SARS virus shared 99.8% of 
its genome witha civet coronavirus. 

So far, the closest match to the new 
coronavirus has been found in a bat in Chi- 
na’s Yunnan province. A study‘ published on 
3 February found that the bat coronavirus 
shared 96% of its genetic material with the 
virus that causes COVID-19. Bats could have 
passed thevirusto people, butscientists think 
itwas probably transmitted through aninter- 
mediate host. 

But if pangolins are a host, and they came 
fromanother country, it raises the question of 
why there haven'tbeen reportsof people being 
infected there, asks Jiang Zhigang, an ecologist 
atthe Chinese Academy of Sciences Institute of 
Zoology in Beijing. 

1. Xiao, K. et al. Preprint at bioRxiv https://dal. 

corg/t0:n0%/2020.0237.951235 (2020), 

2. Lam, T Tet al Prepznt at bioRav itpss/4ol 

org/t0-0%/2020.0214.945485 (2020), 

3. Liu Petal. Preprintat bioxv tps:/fdo 

corg/101101/2020.0238 954628 2020). 

4. Zhang, T, Wu Q.& Zhang, Z. Preprint at bioRXi htps:// 

doLorg/t0:101/2020.0219 950253 2020), 

5. Zhou, P. etal. Nature https://doi.org/10.1038/s41586-020- 

20127 (2020) 


Fundacion 


BBVA 


13th Edition 
BBVA Foundation 
Frontiers of 


Knowledge 
Awards 


Withthecolatorationot 


#CSIC 


The BBVA Foundation Frontiers of 
Knowledge Awards recognize and 
reward world-class research and 
artistic creation, prizing contributions 
of singular impact for their originality 
and significance. The name of the 
scheme is intended to denote not only 
research work that substantially 
enlarges the scope of our current 
knowledge — pushing forward the 
frontiers of the known world~ but also 
the meeting and overlap of different 
disciplinary areas and the emergence 
of new fields, 


The Frontiers of Knowledge Awards 
honor fundamental disciplinary or 
interdisciplinary advances across a 
broad expanse of the knowledge map of 
the 2Ist century. 


The BBVA Foundation is assisted in the 
award process by the Spanish 
National Research Council (CSIC). 


Fundacién BBVA 
Plaza de San Nicolds, 4 48005 Bilbao - Spain 
Paseo de Recoletas,10- 28001 Madrid- Spain 
awards-infowfbbvaes 


© 2020 Springer Nature Limited. All rights reserved. 


Categories 


1/ Basic Sciences (Physics, Chemistry, Mathematics) 
2/ Biology and Biomedicine 

3/ Information and Communication Technologies 
4/ Ecology and Conservation Biology 

5/ Climate Change 

6/ Economics, Finance and Management 

7/ Humanities and Social Sciences 

8/ Music and Opera 


In Humanities and Social Sciences, the award will 
alternate annually between these two disciplinary 
domains, with this thirteenth edition dedicated to 
the Humanities, 


Nomination 


Nominations are invited from scientific or artistic 
societies and organizations, public or private RED 
‘centers, university and hospital departments, 
schools of music, orchestras, and organizations 
working on or around the issue of climate change, 
as wellas other institutions specified in the call 
conditions, 


Entry submission 


‘The nomination period concludes at 23:00 GMT 
‘on June 30, 2020. 


www frontiersofknowledgeawards-fbbva.es 


Nature | Vol579 | 5 March 2020 | 19 


Feature 


Powerful magnetic and electric fields whip charged particles around, ina computer simulation of a spinning neutron star. 


THE STRANGE HEARTS 
OF NEUTRON STARS 


Space observations are poised to reveal more about the centre 
of one of the Universe’s most enigmatic objects. By Adam Mann 


hen a massive star dies ina 

supernova, the explosion is 

only the beginning of the end. 

Most of the stellar matter is 

thrown far and wide, but the 

star’siron-filled heart remains 

behind. This core packs as 

much mass as two Suns and 

quickly shrinks to a sphere that would span 

the length of Manhattan. Crushing internal 

pressure ~ enough to squeeze Mount Everest 

tothe size of a sugar cube ~ fuses subatomic 
protons and electrons into neutrons. 

Astronomers know that much about how 

neutron stars are born. Yet exactly what hap- 

pens afterwards, inside these ultra-dense 
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cores, remains a mystery. Some researchers 
theorize that neutronsmightdominateall the 
way down to the centre. Others hypothesize 
that the incredible pressure compacts the 
material into more exotic particles or states 
that squish and deform in unusual ways. 

Now, after decades of speculation, research 
ers are getting closer to solving the enigma, 
in part thanks to an instrument on the Inter- 
national Space Station called the Neutron Star 
Interior Composition Explorer (NICER). 

Last December, thisNASAspace observatory 
provided astronomers with some of the most 
precisemeasurements ever madeofaneutron 
star’smass and radius", aswell as unexpected 
findings about its magnetic field”. The NICER 
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team plansto release results about more stars 
in the next few months. Other data are com- 
ing in from gravitational-wave observatories, 
which canwatchneutronstarscontortasthey 
crash together. With these combined observa- 
tions, researchersare poised tozero inon what 
fills the innards of aneutron star. 

For many in the field, these results mark 
a turning point in the study of some of the 
Universe’s most bewildering objects. “This is 
beginning to be a golden age of neutron-star 
physics,” says Jurgen Schaffner-Bielich, a 
theoretical physicist at Goethe University in 
Frankfurt, Germany. 

Launched in 2017 aboard a SpaceX Falcon 
9 rocket, the US$62-million NICER telescope 


sits outside the space station and collects 
X-rays coming from pulsars — spinning neu- 
tron stars that radiate charged particles and 
energy in enormous columns that sweep 
around like beams froma lighthouse. The 
X-rays originate from million-degree hot- 
spots ona pulsar’s surface, where a power- 
ful magnetic field rips charged particles off 
the exterior and slams them back down at the 
opposing magnetic pole. 

NICER detects these X-rays using 56 gold- 
coated telescopes, and time-stamps their 
arrival to within 100 nanoseconds. With this 
capability, researchers can precisely track 
hotspots as a neutron star whips around at 
up to 1,000 times per second. Hotspots are 
visibleasthey swingacrossthe object. Butneu- 
tron stars warp space-time so strongly that 
NICER also detects light from hotspots facing 
away from Earth. Einstein's general theory of 
relativity provides away to calculate a star’s 
mass-to-radius ratio through the amount of 
light-bending. That and other observations 
allow astrophysicists to pin down the masses 
and radii of the deceased stars. Those two 
properties could help in determining what is 
happening downin the cores. 


Deep, dark mystery 
Neutron stars get more complicated the 
deeper one goes. Beneatha thin atmosphere 
made mostly of hydrogen and helium, the 
stellar remnants are thought to boast an 
outer crust just a centimetre or two thick 
that contains atomic nuclei and free-roaming 
electrons. Researchers think that the ionized 
elements become packed together inthe next 
layer, creatinga lattice in the inner crust. Even 
further down, the pressure is so intense that 
almostall the protons combine withelectrons 
toturn intoneutrons, but whatoccurs beyond 
that is murky at best (see ‘Dense matter’). 

“it’s one thing to knowthe ingredients,” says 
Jocelyn Read, an astrophysicist at California 
State University, Fullerton. “It’s another to 
understand the recipe, and how those ingre- 
dients are going to interact with each other: 

Physicists have some idea of whathappens, 
thanks to particle accelerators on Earth. At 
facilities such as Brookhaven National Labo- 
ratory in Upton, New York, and CERN's Large 
Hadron Collider near Geneva, Switzerland, 
researchers have smashed together heavy 
ions, such as those of lead and gold, to cre- 
ate brief collections of monumentally dense 
material. But these kinetic experiments gen- 
erate billion- or even trillion-degree flashes, 
inwhich protonsand neutrons dissolve intoa 
soup of their constituent quarks and gluons. 
Terrestrial instruments have ahard time prob- 
ing the relatively mild millions-of-degrees 
conditions inside neutron stars. 

There are multiple ideas about what might 
occur. Itcould be that quarksandgluonsroam 
freely. Or, the extreme energies could lead 


to the creation of particles called hyperons. 
Like neutrons, these particles contain three 
quarks. But whereas neutrons contain the 
mostbasic and lowest-energy quarks, known 
asupand down quarks, ahyperon has atleast 
one of those replaced with an exotic ‘strange’ 
quark. Another possibility is that the centre 
of a neutron star isa Bose-Einstein conden- 
sate, a state of matter in which all subatomic 
particles actasa single quantum-mechanical 
entity. And theorists have dreamt up even 
more outlandish prospects, too. 


“It’sone thingto know 
the ingredients. It’s 
another to understand 
therecipe.” 


Crucially, each possibility would push back 
inacharacteristic way againsta neutron star's 
colossal gravity. They would generate differ- 
entinternal pressures and therefore a larger 
or smaller radius for a given mass. A neutron 
star witha Bose-Einstein condensate centre, 
for instance, is likely to have asmaller radius 
than one made from ordinary material such 
as neutrons. One with a core made of pliable 
hyperon matter could haveasmallerradiusstill, 

“The types of particles and the forces 
between them affect how soft or squashy the 
material is,’ says Anna Watts, a NICER team 
member at the University of Amsterdam. 

Differentiating between the models will 
require precise measurements of the size 
and mass of neutron stars, but researchers 
haven't yetbeen able to push their techniques 
to fine-enough levels to say which possibility 
is mostlikely. They typically estimate masses 
by observing neutron starsin binary pairs. As 
the objects orbit one another, they tug gravi- 
tationally on eachother, and astronomers can 
use this to determine their masses. Roughly 
35 stars have had their masses measured in 
this way, although the figures can contain 
error bars of up to one solar mass. A mere 
dozen or so have also had their radii calcu- 
lated, butin many cases, the techniques can’t 
determine this value to better thana few kilo- 
metres — as muchas one-fifth of the size of a 
neutron star. 

NICER's hotspot method has been used by 
the European Space Agency's XMM-Newton 
X-ray observatory, which launched in1999 and 
isstillin operation. NICERis four times more 
sensitive and has hundreds of times better 
time resolution than the XMM-Newton. Over 
the nexttwo to three years, the team expects 
tobeabletouseNICERtowork outthe masses 
and radii of another halfa dozen targets, pin- 
ning down their radiito within half akilometre. 
With this precision, the group will be well 
placed to begin plotting out what is known 
as the neutron-star equation of state, which 
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relates mass to radius or, equivalently, internal 
pressure to density. 

Ifscientists are particularly lucky and nature 
happens to serve up especially good data, 
NICER might help eliminate certain versions 
of this equation, But most physicists think 
that, onits own, the observatory will proba- 
bly narrow down rather than completely rule 
outmodels of whathappensin the mysterious 
objects’ cores. 

“This wouldstill beahuge advance on where 
‘we are now,’ says Watts. 


Field lines 


NICER’s first target was }0030+0451, an 
isolated pulsar that spins roughly 200 times 
per second and is 337 parsecs (1,100 light 
years) from Earth, in the constellation Pisces. 

‘Two groups — one based primarily at the 
University of Amsterdam’ and another led 
byresearchers at the University of Maryland 
in College Park? ~ separately sifted through 
850 hours of observations, servingas checks 
ononeanother. 

Because the hotspotlightcurvesaresocom- 
plex, the groups needed supercomputers to 
model various configurations and work out 
which ones best fitthe data. Butbothcameup 
with similar results, finding that J0030 hasa 
mass that is1.3 or 14 times that of theSun, and 
aradius of roughly 13 kilometres. 

Those resultsare not definitive: they could 
beused to supporteitherthe mundane or the 
otherworldly predictions for what'sinside the 
guts of neutronstars. “There’sno requirement 
foranythingfunky or crazy or exotic yet,” says 
Andrew Steiner, anuclearastrophysicistatthe 
University of Tennessee, Knoxville. 

Researchers got a bigger surprise with 
findings about the shape and position of 
the hotspots. The canonical view of neutron 
stars has their magnetic field lines looking like 
those surrounding a bar magnet, with north 
and south sides emerging from circularspots 
at opposing ends of the star. By contrast, the 
Dutch supercomputer simulations implied 
that both of 0030's hotspotsare inits south- 
ern hemisphere, and that one of themis long 
and crescent-shaped’. The Maryland teamalso 
came up with the possibility of athree-hotspot 
solution: two southerly oval-shaped onesand 
afinal circle near the rotational south pole’. 

“ttlooks likethey mighthavemade the first 
real detection ofa pulsar where the beamsare 
not 180 degrees separated,’ says Natalie Webb, 
anastrophysicistat the Institute for Research 
in Astrophysics and Planetology in Toulouse, 
France, who has modelled such possibilities. 
“That's fantasticif true” 

The results would bolster previous obser- 
vationsand theories suggesting that neutron 
stars’ magnetic fields, which are one trillion 
times stronger than the Sun’s, can be more 
complex than generally assumed. After they 
first form, pulsars are thought to slow their 
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rotation over millions of years. Butif they have 
acompanionstar orbiting around them, they 
mightsteal material and angular momentum 
from this partner, boosting their spinning to 
superfast speeds. As the matter gets deposited 
on the star's exterior, some theorists suggest 
it could affect a fluid-like layer of subsurface 
neutrons, generating gigantic vortices that 
twist the neutron star’s magnetic field into 
odd arrangements. The companion might 
ultimately be consumed orlosesomuch mass 
that it becomes gravitationally unbound and 
flies away, as could have been the case with the 
now-solitary J0030. 


Workin progress 

NICERis continuing to observe}0030 to further 
improve the precision of its radius measure- 
ments. Atthesame time, the teamis beginning 
toanalyse data fromasecond target, aslightly 
heavier pulsar with awhite-dwarf companion. 
Other astronomers have used observations of 
this pair's orbital dance to determine the pul- 
sar’s mass, which means NICER researchers 
have an independent measurement that they 
can use to validate their findings. 

Among NICER's targets, the team plans to 
include atleasta couple ofhigh-mass pulsars, 
including the current record-holder for most 
massive neutron star — a behemoth with a 
mass 2.14 times that of the Sun. That should 
allow the researchers to probe an upper limit: 
the pointat whichaneutron starcollapsesinto 
ablack hole. Even the 2.14-solar-mass object 
is challenging for theorists to explain. Several 
researchers have also suggested that NICER 
might be able to find two neutron stars with 
thesame mass but different radii. That would 
suggest the presence of a transition point, at 
which slight differences create two distinct 
cores, One might contain mostly neutrons, for 
example, and the other mightbe composed of 
more-exotic material. 

Although NICER is at the vanguard, itis 
not the only instrument plumbing pulsars’ 
depths. In 2017, the US Laser Interferome- 
ter Gravitational-Wave Observatory (LIGO), 
along with the Virgo detector in Italy, picked 
up thesignal from two neutron stars crashing 
andmerging together’. Asthe objects rotated 
around one another before the crash, they 
emitted gravitational waves that contained 
information about the stars’ size and struc- 
ture. Each star's colossal gravitational influ- 
ence tugged on and deformed its partner, 
contorting both from spheres into teardrop 
shapes. The amount of distortion in those 
final moments gives physicists clues about 
the malleability of the material inside the 
neutron stars. 

LIGO's facility in Livingston, Louisiana, 
picked up a second neutron-star smash-up 
last April, and more events could be spot- 
ted at any time. So far, the two mergers have 
only hinted at the properties of neutron-star 
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Core scenarios 
A number of possibilities have been suggested for 
the inner core, including these three options. 
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Quarks Bose-Einstein condensate Hyperons 

‘The constituents of protons Particles such as pions containing _—Particles called hyperons form. 


and neutrons — up and 
down quarks — roam freely. a 


interiors, suggesting that they are not par- 
ticularly deformable. But the current gen- 
eration of facilities can't observe the crucial 
final moments, when the warping would be 
greatestand would display internal conditions 
mostclearly. 

The Kamioka Gravitational Wave Detector 
in Hida, Japan, is expected to come online 
later this year, and the Indian Initiative in 
Gravitational-wave Observations near Aundha 
Naganath, Marathwada, in 2024.In combina- 
tion with LIGO and Virgo, they willimprove 
sensitivity, potentially even capturing the 
details of the moments leading up toacrash. 

Looking further into the future, several 
planned instruments could make obser- 
vations that elude NICER and current 
gravitational-wave observatories. AChinese- 
European satellite called the enhanced X-ray 
Timing and Polarimetry mission, or eXTP, is 
expected to launch in 2027 and study both 
isolated and binary neutron stars to help 
determinetheir equation of state, Researchers 
have also proposeda space-based mission that 
could flyin the2030s called the Spectroscopic 
Time-Resolving Observatory for Broadband 
Energy X-rays, or STROBE. It would use 
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an up quark and an anti-down 
rk combine to forma single 
quantum-machanical entity. 


Like protons and neutrons, 
they contain three quarks but 
include ‘strange’ quarks. 


NICER’shotspot technique, pinning down the 
masses and radii of at least 20 more neutron 
stars with even more precision. 

The hearts of neutron stars will probably 
always retain some secrets. But physicists 
now seem well placed to begin peeling back 
the layers. Read, who isa member of the LIGO 
team, says that she has collaborated on a 
project to imagine what scientific questions 
gravitational-wave detectors would be able 
to tackle in the 2030s and 2040s. In the pro- 
cess, she realized that the landscape for neu- 
tron-star research — in particular, the question 
of the equation of state ~ should look very 
different by then, 

“it’s been this long-standing puzzle that 
you figure will always be there,” she says. 
“Now we're at a point where I can see the 
scientific community figuring out the neu- 
tron-star-structure puzzle within this decade.” 


‘Adam Manns a freelance journalist based in 
Oakland, California. 
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Science in culture 


Books & arts 


Albert Einstein worked at a university in Prague in 1911-12. 


A fresh look at 
Einstein’s Prague period 


Michael Gordin’s elegant prose uses 16 months to build 
a panoramic view of acentury. By Pedro Ferreira. 


any people skip over the fact that, 
fromearly April1911tolateJuly 1912, 
Albert Einstein lived in Prague. “It 
was, after all, such a short time, 
and quite early in the physicist’s 
career,’ Michael Gordin explains at thestart of 
Einstein in Bohemia. Historians have variously 
dismissed those 16 months as an interlude, a 


sojournandadetour. 

Sodid— before! read thisimprobably good 
book. 

Multiple biographies of Einstein agree that 
the Prague periodisnotablefor one main rea- 
son. Free of the heavy teaching load that had 
burdened him as an associate professor at the 
University of Zurich in Switzerland, he focused 
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Einstein in Bohemia 
Michael D. Gordin 

‘ Princeton Univ. Press 
ws (2020) 
BOHEMIA 


ondevelopinghis general theory of relativity. 
Itwasin Prague thatEinstein came up with the 
idea of gravitational lensing, the concept that 
the pull ofstars, planetsand other astronomi- 
cal objects would distort light rays. Thatidea 
= tol ~ isnowcentral tomodern astronomy, 
used to determine, for example, how much 
darkmatteris hoveringaround clusters of gal- 
axies. Although Einstein had yet to conjure up 
a dynamical theory of space-time, it was his 
prediction of gravitational lensing thatwould 
demonstrate the theory’schopsin1919, when 
physicist Arthur Eddington and hiscolleagues 
measured how light from stars in the Hyades 
cluster was deflected by the gravitational pull 
of the Sun during solar eclipse. 


Personal impact 

Ata personal level, gaining the position of 
professor of theoretical physicsatthe German 
University in Prague pushed Einstein into the 
senior echelons of academia, Just three years 
before, he had been a patent clerk in Bern, 
Interestingly, Gordin tells us, he was not the 
first person to be offered the job: first refusal 
went to one Gustav Jaumann at the German 
Technical University in Brno (who remembers 
him?), who turned it down. 

Itwasalso in Prague that Einstein's marriage 
to fellow physicist Mileva Mari¢ began to fall 
apart, She was miserable there: snubbed as a 
Serbian and resentful atbeing dragged around 
for her husband's career, then left sitting at 
home while he travelled elsewhere to give 
talks and collaborate. It was during a trip to 
Berlin that he began an affair with his cousin 
Elsa Lowenthal, who eventually became his 
second wife. 

But that is about. Slim pickings on which 
to base a whole book, I feared. Yet Gordin 
does something ingenious. He uses Einstein 
as aMacGuffin,a device that propels the plot 
buthas little significance to thestory he wants 
to tell. He explodes the narrative out of what 
he calls the “spacetime interval” of 1911-12 
to followa host of figures who were involved 
with Einstein in Prague, in some cases very 
tangentially. In so doing, he careers through 
the history of ideas as well as the political 
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Books in brief 


THE STORY 
OR MORE 


HOPE JANREN 


The Story of More 

Hope Jahren Vintage (2020) 

In 2009, palaeobiologist Hope Jahren was required to teach climate 
change. Initially reluctant, she soon conceived a vocation. Her 
compelling book uses statistics brilliantly to provoke self-examination. 
In sections on ‘Life’, Food, ‘Energy’ and ‘Earth’, it illuminates subjects 
from population growth to melting glaciers. If the whole planet 
consumed resources on the US scale, carbon dioxide emissions would 
be more than four times higher, she observes: “Using less and sharing 
more is the biggest challenge our generation will ever face.” 
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Incred™ 
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The Incredible Journey of Plants 
Stefano Mancuso (transl. Gregory Conti) Other Press (2020) 

About 400 metres from ground zero in Hiroshima, a weeping willow 
and other plants regrew from their roots. Revered, they are labelled 
hibakujumoku, “trees that suffered an atomic explosion”, an elderly 
Japanese diplomat translates in flawless Italian for visiting plant 
neurobiologist Stefano Mancuso, Later, he confesses he is a hibakusha 
he survived the strike because his classroom was protected by a 
curtain of trees. Such anecdotes enliven Mancuso’s quirky little global 
history, which argues that plants “are more sensitive than animals’ 


David Fer 


Footprints: In Search of Future Fossils 

David Farrier Farrar, Straus and Giroux (2020) 

Fossil footprints unmasked by a 2013 storm on the English coast 
revealed that hominins walked beside an estuary 850,000 years 
ago. Although quickly erased by the tide, they inspired David 

Farrier to consider modern civilization’s future footprints, including 
Neil Armstrong's marks on the Moon and the nuclear footprint: a 
geological repository for Finland's spent fuel. This is designed to be 
forgotten — unlike its US equivalent, which proposes to use warning 
signs modelled on Edvard Munch's 1893 painting The Scream. 


dc ce 
The Future 
of Brain Repair 


Seta ey 


The Future of Brain Repair 

Jack Price MIT Press (2020) 

In 1996, neurobiologist Jack Price, then at a major pharmaceutical 
‘company, was invited to fund academic research into stem-cell 
therapies. He declined, Now an academic himself, he is more hopeful. 
In 2006, Shinya Yamanaka discovered how to make ‘pluripotent’ stern 
cells, enabling brain-like tissue to be generated in a dish — “albeit 
small, misshapen and underdeveloped”, as Price notes in his clear, 
honest but intellectually challenging account. Today, several therapies 
have entered clinical trials. But how to make them affordable? 


John Kay 
M 


Radical Uncertainty 

John Kay and Mervyn King Bridge Street (2020) 

‘When Christopher Columbus sought a westerly route to the Indies, 
“whatever counted as cost-benefit analysis in the Spanish court took 
no account of the possibility of a New World", say economists John 
Kay and Mervyn King. They refreshingly criticize their discipline for not 
recognizing that its use of ‘isk’, ‘uncertainty’ and ‘rationality’ doesn't 
match that of lay people. Odd, then, that their far-ranging book on 
“radical uncertainty” mentions Max Planck’s dalliance with economics 
but not Werner Heisenberg's uncertainty principle. Andrew Robinson 
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turmoil of Bohemia (now part of the Czech 
Republic) during most of the twentieth 
century, touching on physics, philosophy, 
nationhood, anti-Semitism and the rise of 
Prague asa centre of intellectual life. 

There are quirky observations, almost 
worthy of playwright Tom Stoppard. For 
example, Einsteinand writer Franz Kafka prob- 
ably met ata 1911 cultural soirée in the house 
of Berta Fanta, a “philosophically ambitious” 
socialite whohelda salon above her husband's 
pharmacy in Prague’s Old Town Square, 


Social circle 


But what really grips are the people. Take 
Oskar Kraus, a philosopher at the German 
University. Originally trained in law, he took 
against Einstein, writing countless articlesin 
philosophy journals unpicking what he saw 
as egregious internal inconsistenciesin rel- 
ativity. His writing and stance foreshadowed 
the anti-relativity strand of the Deutsche 
Physik movement, an eviscerating force in 
German academia during the rise of the Third 
Reich. Kraus, who had been bornintoa Jewish 
family but converted to Protestantism, was 
arrested by the Gestapo and ultimately fled 
to Oxford, UK. 

Inevitably, Gordin takesin EmstMach, who 
had been ina post similar to Einstein’s at the 
German University’s forerunner from 1867 
to 1895. Mach had been “the most successful 
physicist in the university's history” and played 
an important roleas rector for part of histen- 
ure. But, like Einstein, he had been thesecond 
choice for the post. Mach’s ideas shaped the 
work of important relativists after Einstein, 
such as Dennis Sciama and Robert Dicke. 

Another pen portrait is of Einstein's 
successor in the post, physicist and philoso- 
pher Philipp Frank. His journey through the 
turbulent Prague of the 1930s servesas spot- 
light ona place battered by historical forces. 
During the late 1920s and early 1930s, Frank 
was partof the Vienna Circle, ahugely influen- 
tial group of scientists and philosophers that 
also included philosopher Rudolf Carnapand 
mathematician Kurt Gédel. In Prague, Frank 
did much to carry the flame of both Einstein 
and Mach's ideas through books and journal 
articles, publicly sparring with Kraus when- 
ever necessary. In 1938, he had to flee to the 
United States, where he ended up at Harvard 
University in Cambridge, Massachusetts, and 
wrote one of Einstein's first and most notable 
biographies. 

Thisisa panoramicview of twentieth-century 
Bohemia, witha sprinkling of Einstein. Butwhat 
really carries itthroughisthe beauty and force 
of Gordin’s prose. 


Pedro Ferreira is professor of astrophysics 
at the University of Oxford, UK, and author of 
The Perfect Theory. 

‘e-mail: pedro ferreira@physics.ox.ac.uk 
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Women carry coal from an open-cast mine in Jharkand state in India. 


Emissions: world has four times 
thework or one-third of the time 


Niklas Hdhne, Michel den Elzen, Joeri Rogelj, Bert Metz, Taryn Fransen, Takeshi Kuramochi, Anne Olhoff, Joseph Alcamo, 
Harald Winkler, Sha Fu, Michiel Schaeffer, Roberto Schaeffer, Glen P. Peters, Simon Maxwell & Navroz K. Dubash 


Newsynthesis shows what a 
wasted decade means for the 
climate pact made in Paris. 


he past decade of political failure on 
climate change has cost us all dear. It 
has shrunk the time left for action by 
two-thirds. In 2010, the world thought 
ithad30yearstohalveglobal emissions 
of greenhouse gases. Today, we know that this 
must happen in ten years to minimize the 
effects of climate change. Incremental shifts 
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that might once have been sufficient are no 
longer enough. 

The further bad news is that, even taken 
together, the proposed climate action by 
all countries is along way from meeting this 
requirement. Rather than halving emissionsby 
2030, countries’ climate proposalswilllead toa 
slightincrease. Worsestill, individual countries 
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arenot on track to achieve commitments that 
were insufficient from the outset and are now 
woefully inadequate. 

The better news is that more countries, 
regions, cities and businesses are implement- 
ing the deep, rapid transformations that are 
urgently required, Atscale, these couldachieve 
thecollectiveclimate goalsthatnationsagreed 
in Paris more than four years ago. There are 
lessons to be learnt from places such as Costa 
Rica, Shenzhen in China and Copenhagen that 
havemadestrides through theuse of renewable 
energy and electrified transport. The United 
Kingdom (together with 75 other parties) and 
California have at least set ambitious goals 
tobecome carbon neutral, which might send 
signals to industry even before supporting 
policiesareimplemented. Meanwhile, 26 banks 
have stopped directly financingnew coal-fired 
power plants (see go.nature.com/32uped2). 

Much is happening on the ground. The 
question is how to ramp up these activities 
fastenoughtokeep warming tolessthan1.s°C 
above pre-industrial levels. 

Here we presenta snapshot of the extent to 
which nations’ individual pledges are incon- 
sistent with their stated collective goals. We 
also note some of the pockets of promise. We 
drawour conclusions fromasynthesisofall ten 
editions of the Emissions Gap Report produced 
bythe United NationsEnvironmentProgramme 
(UNEP)"*, Each year for the past decade, this 
report has examined the difference between 
what countries have pledged todo individually 
toreduce greenhouse-gas emissions, and what 
they need to do collectively to meet agreed 
temperature goals the gap’ 

Our analysisshowsthat the gap has widened 
byasmuchas four timessince 2010, Thereare 
three reasons for this. First, global annual 
greenhouse-gas emissions increased by 14% 
between2008and2018 (ref.6). This means that 


Electric taxis at a charging station in Shenzhen, China. 


emissions now have to decline faster than was, 
previously estimated, because itis cumulative 
emissions that determine the long-term tem- 
perature increase. Second, the international 
community now agrees that it must ensure a 
lower global temperature rise thanit decided 
ten years ago, because climate risks are better 
understood. Andthird, countries’ new climate 


THE SEVEN TOP EMITTERS 

Countryorregion(2018 _ Changeinprojected Potential reasons 

emissionsingigatonnes _greenhouse-gas emissions 

CO: equivalent)" by2030 since 2015 

China (13.2) No change New climate and energy policies; altered 
growth projections. 

United States (6.6) No change Rollback of federal policies works against 
price drops in renewables and reductions 
in coal use. 

European Union (4.0) Lower Mostly attributable to implementation of 
new policies. 

India (3.8) Slightly lower Unclear. 

Russia (2.4) No change No change in policies or growth, 
projections, 

indonesia (2.3) Higher Higher emissions projections from 
deforestation. 

Brazil (16) Higher Higher emissions projections from 


deforestation. 


‘Comparison ofthe 2015 and 2018 UNEP Emissions Gap Report™” and other sources"-* provides information about changes 
In current policy projections for the leading emitters. Uncertainties for each estimate are large. See Supplementary informa: 


tion for detail 
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pledges have been insufficient. 

Thetenthanniversary of thereport coincides 
with the 2020 milestone to which countries 
agreed in Paris. They undertook to communi- 
cate or update climate pledges, or ‘nationally 
determined contributions, to the UN Frame- 
work Convention on Climate Change confer- 
ence (COP26) this November in Glasgow, UK. 
Clearly, the promises must be overhauled — 
and then, crucially, kept — ifthe yawning gap 
between talkandwalk’isgoingtocloseby2030. 


Gap minder 

‘The scope of the UNEP emissions gap reports 
has evolved over time, in line with climate 
policy. So what has changed during the past 
decade? 

In the 2009 Copenhagen accord’ and the 
2010 Cancun agreement’, countries collec- 
tively pledged to limit warming to below2°C, 
and 73 countries individually pledged emis- 
sions targets for 2020. The 2015 Paris agree- 
ment, responding to mounting concern over 
the impacts of climate change, tightened the 
collective temperature limit to “well below 
2°C" and agreed “to pursueefforts tolimit the 
temperature increase to1.5 °C” (ref. 9).Under 
the Paris deal, 192 partiesindividually pledged 
emissions targets, typically for2030 (see’More 
and faster’). 

From2010to2014, thegap reports projected 
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the likely difference in 2020 between the 
expected result of countries’ pledges and the 
pathways towards 2 °C. The 2010 report doc- 
umented a shortfall of 14%. Since 2015, the 
reports have forecast the expected shortfall 
in 2030 between the countries’ pledges and 
progress towards both1.5 °C (current shortfall 
of 55%) and 2°C (current shortfall of 25%; see 
‘More and faster’). The report also examines 
the policies that countries are implementing 
domestically. 

Had serious climate action begun in 2010, 
the cuts required to meet the emissions levels 
for2.°C would have been around 2% per year, 
on average, up to 2030. Instead, emissions 
increased. Consequently, the required cuts 
from 2020 are now more than 7% per year on 
average for 1.5 °C (close to 3% for2 °C). 

The time window for halving global emis- 
sions has also narrowed drastically. In 2010, 
it was 30 years; today, itis 10 years for 1.5 °C 
(25 years for 2°C). Although many reports, 
scientists and policymakers continue to di 
cuss rises of 2°C, itmustbe emphasized that, in 
2018, the Intergovernmental Panelon Climate 
Change reported that warming of more than 
15°C would be disastrous”. 

Countries are not even on track to achieve 
their now plainly inadequate 2015 pledges. 
Of the G20 countries, seven (Australia, Brazil, 
Canada, Japan, South Korea, South Africa and 
the UnitedStates) need toimplementexisting 
policy or roll out new measures. (The United 
States has begun the process of withdrawing 
from the Paris agreement, and will leave in 
November.) Russia and Turkey have set them- 
selves unambitious targets that they can meet 
without new policies. 

Since 2015, estimated global emissions 
in 2030 have decreased by only 3%. For the 
leading seven emitters, 2030 estimates have 
slightly decreased, flatlined or increased (see 
“The seven top emitters’). 

No single model can predict the future, and 
such analyses by necessity exclude the most 
recent developments. Nevertheless, itis clear 
that, collectively, current policies willnot limit 
global warming to well below 2°C, letalone 
15°C, as agreed in Paris. 

Clearly, the annual auditofthe emissions gap 
hasnotaltered poor performance. The gap con- 
cepthas nonetheless proved useful, The reports 
and numbers have continuously informedthe 
UN climate summits" and the emissions gap 
was noted asa serious concern when parties 
were adopting the Paris agreement’. 


Transformative action 

Fundamental policy transformations have 
begun to appear in some sectors, countries, 
regions, citiesand businesses over the past ten 
years, These innovations seek to achieve the 
UN's Sustainable Development Goals (SDGs), 
including climate ones. Slashing emissions 
now requires ‘leaving no one behind. 


MORE AND FASTER 


Insufficient climate action during the past decade means that transformational development 
pathways are now required to reduce greenhouse-gas emissions on time. 
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Torecognize, monitorand understand these 
advances, the gap reports have included exam- 
ples. Someare discussed here. 

Ambitious action. Most encouragingly, a 
wealth ofagile nations, regions, citiesand busi- 
nesses have promised or made radical changes 
since the Paris agreement (see‘Action gap’ and 
Supplementary Information, See also go.na- 
ture.com/2t22tth). Atthe last count, net-zero 


“Current policies will not 
limit global warming to 
well below2°C, let alone 
1.5°C, asagreed in Paris.” 


emissions goals have beenset or are being con- 
sidered by 76 countries or regions (the Euro- 
peanUnion is thelargest) and 14 sub-national 
regions or states (the largest being California); 
some locations have begun implementation. 
Together, these places account for about 21% 
of global greenhouse-gas emissions". 
Fifty-three countries and 31 states and 
regions have explicitly committed toan emis- 
sions-free electricity sector. Seven additional 
countries have done soimplicitly by aiming for 
net-zero greenhouse-gas emissions. Together, 
these account for around 18% of global 
electricity generation". Twenty-one coun- 
tries, 5 regions and more than 52 cities have 
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committed tomakeall vehicles emissions-free. 
Individual examples also exist for sectors in 
which reaching zero emissions was thought 
to be difficult, such as heavy industry and 
aviation. Steel giants ThyssenKruppin Essen, 
Germany, and SSAB in Stockholm are aiming 
for zero-emissions steel production by 2050 
and 2045, respectively. The building-materials 
company Heidelberg Cement, headquartered 
in Germany, is aiming for zero-emissions 
cement production by 2050. For aviation, 
Norway and Scotland hope to make short-haul 
and domestic flights zero emissions by 2040. 

Renewables. Costs of renewable energy 
are falling faster than expected". Renewa- 
blesare currently the cheapest source of new 
power generation in most of the world. Solar 
and wind power will be financially more com- 
petitive than will existing coal plants by next 
year", These cost declines, and those of bat- 
tery storage, are opening up possibilities for 
large-scale, low-carbon electrification. 

Coal consumption. The rise of renewable 
energy can ~ must - facilitate a move away 
from coal, Emerging economies that depend 
on coal, such as China and India, have begun 
to address consumption by adjusting the 
fue'’s price, capping its consumption, reduc- 
ing plans for new coal-fired power plants and 
supporting renewables. Much more must be 
done, and quickly — while addressing poverty, 
energy access and urbanization"**. 
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UNSDGs. Actionsto reduce greenhouse-gas 
emissions are essential for achieving food 
security, healthy lives and many other SDGs, 
asconfirmed bya growingbody of research'™”. 
Forexample, renewable energy cutsair pollu- 
tion, and improves healthand energy security 
compared with fossil fuels. 


Closing the gap 
‘These few success stories must be scaled up 
and mirrored with progress in every sector. 
‘The fact that reductions in greenhouse-gas 
emissions are a prerequisite to achieving 
sustainable development must propel action. 
The gap is so huge that governments, the 
private sector and communitiesneedtoswitch 
into crisis mode, make their climate pledges 
more ambitiousand focus on early andaggres- 
siveaction. Otherwise, the Paris agreement's 
long-term goals are out of reach. We do not 
have another ten years. 
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ACTION GAP 


Although 192 parties pledged various emissions targets under the Paris 
climate agreement, commitments to specific actions remain sparse" 


Net-zero emissions goals set 


76? of 192 parties 


‘Stop fossil-fuel exploration and production 
Gof 182 parties [@ COSCO 


“For more details see ganatra.con/2020h and Supearararynfermation at ga.nstre.con/cbyh 
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This action by 76 parties 
accounts far 10.2 GtCO,,¢ of 
global greenhouse-gas emissions 


52.6 GtCO,e total global 
greenhouse-gas emissions 


This action by 6 parties 
corresponds to 5 exajoules of 
global fossil-energy production 


500 exajoules total global 
fossil-energy production 
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Readers respond 


Correspondence 


Fast peer review for 
COVID-19 preprints 


The public call for rapid sharing 
of research data relevant to 

the COVID-19 outbreak (see 
go.nature.com/2tllyp6) is 
driving an unprecedented 

surge in (unrefereed) preprints. 
Tohelp pinpoint the most 
important research, we have 
launched Outbreak Science 
Rapid PREreview, with support 
from the London-based charity 
Wellcome. Thisis an open- 
source platform for rapid 

review of preprints related to 
emerging outbreaks (seehttps:// 
outbreaksci.prereview.org). 

These reviews comprise 
responses to short, yes-or-no 
questions, with optional 
commenting. The questions 
aredesigned to capture 
structured, high-level input 
onthe importance and quality 
of the research, which can 
be aggregated across several 
reviews. Scientists who have 
ORCID IDs can submit their 
reviewsas they read the 
preprints (currently limited to 
the medRxiv, bioRxivand arXiv 
repositories). The reviews are 
openand can be submitted 
anonymously. 

Outbreaks of pathogens such 
as the SARS-CoV-2.coronavirus 
that is responsible for COVID-19 
move fast and can affect anyone. 
Research to support outbreak 
response needs to be fast and 
open, too, as domechanisms 
to review outbreak-related 
research. Help other scientists, 
as well as the media, journals 
and public-health officials, 
tofind the mostimportant 
COVID-19 preprints now. 


Michael A. Johansson Outbreak 
Science, San Juan, Puerto Rico. 
michael@outbreakscience.org 


Daniela Saderi PREreview, 
Portland, Oregon, USA. 


Ethics of editing 
human genomes 


Asleaders of the national 
ethics committees of France 
and Germany, and of the UK 
Nuffield Council on Bioethics, 
we consider that the moral 

and societal issues raised by 
developments in heritable 
human-genome editing 
demanda level of public ethical 
reflection that current initiatives 
fail to meet. 

Inajoint statement, we call on 
governments and stakeholders 
worldwide to ensure that 
heritable genome editing is 
brought within the control 
of relevant public authorities 
(see go.nature.com/3ckImc). 
Furthermore, no clinical 
applications should be 
considered until there has 
been broad societal debate 
about their acceptability and 
until research has reduced the 
considerable risks of clinical use 
toanacceptable level. Measures 
mustbe in place to ensure that 
these risks can be properly 
assessed and monitored. 

Moreover, any ethically 
permissible application of 
human genome editing should 
not increase disadvantage, 
discrimination or divisionin 
society. The large rangeof 
conceivable applications, as well 
as their implications for families, 
society and future generations, 
calls for cautious, responsible 
and transparent governance 
(seealso go.nature.com/3c9fel). 


David Archard Nuffield Council 
on Bioethics, London, UK. 
d.archard@qub.ac.uk 


Peru’s research: 
CONCYTEC responds 


As president of Peru's National, 
Council of Science, Technology 
and Technological Innovation 
(CONCYTEC), [disagree that 
the governmentis not showing 
sufficient interest in the 
country’s research (see Nature 
576, S65-S67; 2019). 

The government's 
expenditure on research and 
development has increased over 
the past decade, and this year 
sees its highest budget ever, at 
214 million soles (US$63 million; 
see go.nature.com/2ufuxlk, 
in Spanish). And some public 
universities are investing their 
royalties from natural resources 
suchas mining into research 
infrastructure and projects. 

‘2018 reportby Elsevier 
commissioned by CONCYTEC 
indicates that Peru’s field- 
weighted citationimpactin 
2013-17 was above the world’s 
average. And, according to 
SCImago rankings, Peru's 
research is becoming less. 
dependent on international 
collaborations, with more 
than 40% of its publications in 
2018 exclusively authored by 
Peruvian scientists. 

Last May, the government 
passed a law to attract and 
retain more highly qualified 
scientists. CONCYTEC, with 
the support of a World Bank 
project, incorporated 181 local 
and foreign researchers into 
Peruvian institutions in 2019. 

Notwithstanding these 
efforts, we recognize that we 
still havea long way to goin 
improving Peru's research. 


Social priming: a 
dubiousterm 


The great replicability mystery 
of’social priming’ in psychology 
(Nature $76, 200-202;2019) 
turns outtoreflectamundane 
fact: priming studies (social 

or non-social) that use reliable 
methods arehighly replicable, 
whereas those that don’tare not. 
In our view, itis time to dispense 
with the term once and for all. 

Social priming occurs when 
exposure toa social concept or 
stimulus affects later behaviour. 
One problemis that thereis 
no clear social component 
tomuch of whatis defined as 
social priming (in priming with 
numbers or the idea of death, 
for example). And many studies 
that are obviously social (such 
as priming with stereotypes) are 
excluded. 

Furthermore, those studies 
identified as social priming 
almost exclusively collect a 
single response to a single prime 
per subject, whereas others that 
collect hundreds of responses 
to multiple primesare excluded 
from analyses of social priming. 
Thus, social-priming studies 
have less power to detect real 
effects and are more prone to 
false positives. 

Dozens of priming effects 
using social stimuli are designed 
to observe multiple behaviours 
and are highly replicable. But 
whenanon-social priming 
study measures only asingle 
response per subject, the effects 
are — unsurprisingly — weak and 
unreliable (see A.M. Rivers and). 
W. Sherman Preprintat Psy ArXiv 
http://doi.org/dng4; 2018). 
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Biomechanics 


Ahead of the curve inthe 
evolution of human feet 


Glen A. Lichtwark & Luke A. Kelly 


The longitudinal arch has long been considered a crucial 
structure that provides stiffness to the human foot. Now the 
transverse arch is stepping into the spotlight, with a proposed 
central role in the evolution of human foot stiffness. See p.97 


Humans evolved to walk and run effectively 
onthe ground using two feet. Our arched foot, 
whichis nota characteristic of other primates, 
isa unique feature crucial for human bipedal- 
ism. The arch provides the foot with the stiff- 
ness necessary toactasa lever that transmits 
the forces generated by leg muscles as they 
push against theground. The archalso retains 
sufficient flexibility to function like a spring 
tostore and then release mechanical energy. 
On page 97, Venkadesan eral.‘ present anew 
view of how foot stiffness is regulated. Their 
findingnot only has excitingimplications for 
understanding foot evolution, but also pro- 
videsa possible framework when considering 
foot healthand how todesign better footwear. 

The foots longitudinal arch (the arch that 
runsfromtheheel to the ball of the foot; Fig. 1) 
is often credited" with the leading role in 
foot stiffening. The ligaments spanning this 
arch, including the plantar Fascia (or plantar 
aponeurosis), act like a bowstring to resist 


arch collapse when force is applied. More- 


over, the spring-like mechanical properties 
ofthese ligaments contributesubstantially to 
the foot’s ability to store and return energy! 
However, Venkadesan and colleaguespres- 
ent the idea that another arch component, 
the transverse arch (the part of thearch that 
curves across the foot at the base of the meta- 
tarsal bunes; Fig. 1) is at least as inpor tant 
for foot stiffnessas is the longitudinal arch, 
if notmoreso, The authors provide evidence 
forhow transverse-arch curvature might help 
prevent foot bending and therefore increase 
foot stiffness. An analogy for this proposed 
stiffening mechanism is the way that a pizza 
slice becomes less floppy ifthe slice’s outer 
crusts curled up 
Venkadesan etal. initially took atheoretical 
approach to investigate the role of transverse 


curvature in stiffening the foot. Modelling 
an elastic shell, the authors demonstrated 
that, if the transverse curvature of the shell 
increased, this increased the stiffness of the 
shell in the longitudinal direction. Venkadesan 
and colleagues derived a parameter for curva- 
ture and longitudinal stiffness (independent 
of other factors such as shell size and thick- 
ness), and show thata distinct transition point 
exists beyond which the amount of curvature 
directlyinfluencesthe longitudinal stiffness. A 
similar relationship exists for a physical model 
consisting of discrete rigid elements (analo- 
gous to the metatarsal bones) connected by 
springs (corresponding to ligaments). 
Totest whether this model mightbe relevant 


ace / 
a ‘oh 
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to the stiffness of the human foot arch, the 
authors examined human cadaver speci- 
mens (frozen after death and then thawed to 
combat stiffening due to rigor mortis) and 
cut ligaments in the transverse arch that are 
expected to be crucial for coupling the curva- 
ture of this arch to foot stiffness. Venkadesan 
etal. then assessed the foot's vertical defor- 
mation when loads were applied. Cutting the 
transverse ligaments reduced foot stiffness 
by the remarkable value of more than 40%. 
By comparison, previous research’ indicates 
that cutting the foot's plantar fascia, which 
spans the longitudinal arch, reducesstiffness 
byjust 23%, Venkadesan and colleagues’ data 
therefore suggest that transverse ligaments 
makea substantial contribution to overall foot 
stiffness. When bearing a load, the foot’strans- 
verse ligaments are presumably stretched by 
theresultant spreading outof the metatarsals 
at the ball of the foot. The authorssuggest that 
this ligament stretching is a direct result of 
transverse-arch curvature. 

Venkadesan and co-workers examined 
the evolution of the transverse arch across 
different primates, including various spe- 
cies of extinct hominin (those species more 
closely related to humans than to chimpan- 
zees). As in other work’ investigating foot 
evolution, Venkadesan etal. focused on the 
amount of torsion (twist) in the fourth meta- 
tarsal bone. They estimated the curvature of 
the transverse arch and determined which 
species would probably have had sufficient 
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Figure 1| Human foot arches. The longitudinalarch of the human foot has been proposed” to havea 
key role in providing stiffness for the foot, an attribute that enables humans to walk on the ground on two 
feet. Venkadesan etal report that another foot arch — the transverse arch, which isinthe vicinity of the 
‘metatarsal bones — makes.a major contribution to foot stiffness, 
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curvature to induce stiffening of this arch to 
an extent similar to that of modern humans. 
For example, the authors examined the spe- 
cies Australopichecus afarensis. This species 
existed more than three million years ago, 
and whether it walked upright in a human- 
like fashion is debated**. Venkadesan er al. 
reportthat the transverse arch of A. afarensis 
wasless curved than that ofahuman foot and 
thus, according to their model, probably less 
stiff, However, the authors correctly empha- 
size thatsuch curvature alonecannotbeused 
reliably to infer movement capabilities, and 
other mechanisms might stiffen the foot suf- 
ficiently to allowa human-like gait. 

‘The curvature of transversearches inhuman 
populations probably spans a wide range of 
values, Some people have noticeably flat 
feet whereas others have a high arch. Per- 
haps those with flat feet have less curvature 
of their transverse arch and thus potentially 
reduced stiffness in their feet compared with 
those whose feet are less flat. Butitisalsopos- 
sible that people with flat feet have sufficient 
transverse-arch curvature to compensate for 
theirlow longitudinal arch, thereby maintain- 
ing sufficient stiffness for effective walking 
and running. Given that Venkadesan and col- 
leagues’ work did not directly test whether 
there is a relationship between transverse- 
arch curvatureand thestiffness of the human 
foot, itremains tobe determined whether the 
range of differencesin human transverse-arch 
curvatureis a crucial functional parameter to 
explain footstiffness. 

The range of curvature of thearchofhuman 
feet suggested by Venkadesan etal. would indi- 
cate that anearly twofold changeinstiffnessis 
possibleasaresult of natural variation in cur- 
vature of the trans verse arch from one person 
tothenext. However, any relationship between 
transverse-arch curvature and stiffness is 
probably not enough to completely explain 
the regulation of foot stiffness, and other 
factors will also need to be considered — for 
example, the stiffness of the plantar fascia or 
the potential for muscles to actively regulate 
arch stiffness. As such, caution is necessary 
before relying on this curvature parameter 
alone as the key variable in assessing human 
foot stiffness. 

The fields of evolutionary biology, sports 
science and medicine have largely neglected 
the transverse arch when trying to explain the 
managements of loads applied to the foot. 
Venkadesan and colleagues’ research sug- 
gests a new mechanism that links foot form 
and function and sets the scene fora possible 
shift in how the human foot is considered. 
More research will beneeded to better under- 
stand how the transverse arch contributes to 
human locomotor performance, including 
determining what its contribution is to an 
individual's foot stiffness and whether this 
providesany mechanical or energetic benefits. 
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Itis conceivable that new treatments that take 
advantage of transverse-arch curvature to 
modulate foot stiffness could be developed 
for variousfoot disorders. Perhapseven more 
exciting are the implications of this work for 
efforts tomimicahumanfootwhen designing 
prosthetic limbs or legged robots. 
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In-sensor computing 
for machine vision 


Yang Chai 


Animage-sensor array has been developed that acts as its 
own artificial neural network to capture and identify optical 
images simultaneously, processing the information rapidly 
without needing to convert it to a digital format. See p.62 


Sight is one of our most vital senses. 
Biologically inspired machine vision has devel- 
oped rapidly in the past decade, to the point 
that artificial systems can ‘see’ in the sense of 
gaining valuable information from images 
and videos", although human vision remains 
much more efficient. On page 62, Mennel etal.* 
reportadesign fora visual system that, rather 
likethe brain, can betrained toclassify simple 
imagesin nanoseconds. 

Modern image sensors such as those in 
digital cameras are based on semiconductor 
(solid-state) technology and were developed 
inthe early 1970s; they fallinto two main types, 
knownas charge-coupled devices and active- 
pixel sensors’. These sensors can faithfully 
capture visual information from the environ- 
ment, but generate a lot of redundant data, 
Thisvastamount of optical informations usu- 
ally converted toa digital electronic format 
and passed to a computing unit for image 
processing. 

The resulting movement of massive 
amounts of data between sensor and pro- 
cessing unit results in delays (latency) 
and high power consumption. As imaging 
rates and numbers of pixels grow, band- 
width limitations make it difficult to send 
everything back to a centralized or cloud- 
based computer rapidly enough for real-time 
processing and decision-making ~ whichis 
especially important for delay-sensitive 
applications such as driverless vehicles, 
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robotics or industrial manufacturin; 

A better solution would be to shift some 
of the computational tasks to the sensory 
devices at the outer edges of the com- 
puter system, reducing unnecessary data 
movement. And because sensors normally 
produce analog (continuously varying) out- 
puts, analog processing would be preferable 
to digital: analog-to-digital conversion is 
notoriously time- and energy-consuming. 

To mimic the brain’s efficient processing 
of information, biologically inspired neuro- 
morphic engineering adopts a computing 
architecture that has highly interconnected 
elements (neurons, connected by synapses), 
allowing parallel computing (Fig. 1a). These 
artificial neural networks can learn from their 
surroundings by iteration —for instance, learn- 
ing to classify something after being shown 
known examples (supervised learning), or 
to recognize a characteristic structure of an 
object from input data without extrainforma- 
tion (unsupervised learning). Duringlearning, 
an algorithm repeatedly makes predictions 
and strengthens or weakens each synapse 
in the network until it reaches an optimum 
setting. 

Mennel and co-workers implement an 
artificial neural network directly in theirimage 
sensor. Onachip, they constructanetwork of 
photodiodes ~ tiny, light-sensitiveunits, each 
consisting of a few atomic layers of tungsten 
diselenide. This semiconductor’s response 


to light can be increased or decreased by 
altering an applied voltage, so that the sensi- 
tivity ofeach diode canbe individually tuned. 
Ineeffect, this turns the photosensor network 
intoa neural network (Fig. 1b) andallowsit to 
carry outsimple computational tasks. Chang- 
ing the light responsivity of a photodiode 
alters the connection strength ~ the synap- 
ticweight — in the network. Thus, the device 
combines optical sensing withneuromorphic 
computing. 

Theauthorsarrange the photodiodes intoa 
square array of nine pixels, with three diodes 
toeach pixel. When an imageis projected onto 
thechip, various diode currentsare produced, 
combined and read. The hardware array 
provides a form of analog computing: each 
photodiode generatesan output current that 
is proportional to theincident lightintensity, 
and the resulting currents are summed along 
arowor column, accordingto Kirchhoff’s law 
(fundamental rule of currentsin circuits). 

Thearray is then trained to perform a task. 
The discrepancy between the currents pro- 
duced by the array andthepredicted currents 
(the currents that would be produced if the 
array responds correctly to the image, fora 
given task) is analysed off-chip and used to 
adjust the synaptic weight for the next train- 
ing cycle. This learning stage takes up time and 
computing resources, but, once trained, the 
chip performsits set task rapidly. 

Using differentalgorithms for the neural 
network, the authors demonstrate twoneuro- 
morphic functions. The first is classification: 
their3 «3 array of pixels cansortanimageinto 
one of three classes that correspond to three 
simplified letters, and thus identify which 
letter itis in nanoseconds. This relatively sim- 
pletaskisjusta proof of concept, and could be 
extended to recognizing more-complicated 
images if the array size were scaled up. 

The second function is autoencoding: the 
computing-in-sensor array can produceasim- 
plified representation ofa processed image by 
learningitskey features, evenin the presence 
of signal noise. The encoded version contains 
only the most essential information, but can 
be decoded to reconstruct an image close to. 
the original, 

There is more to be done before this 
promising technology can be used in practical 
applications. Aneuromorphic visual system 
for autonomous vehicles and robotics will 
need to capture dynamic images and videos 
in three dimensions and with a wide field of 
view. Currently used image-capture technol- 
ogy usually translates the 3D real world into 
2D information, thereby losing movement 
information and depth. The planar shape of 
existingimage-sensorarraysalso restricts the 
development of wide-field cameras*. 

Imaging under dim light would be difficult 
for the device described by the authors. A 
redesign would be needed to improve light 
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Figure 1 | Computing withina vision sensor for intelligent and efficient preprocessing. a In 
conventional artificil-intelligence (Al) vision sensors, signals are collected from light-responsive sensors, 
converted from analog to digital form (ADC, analog-to-digital converter), amplified and then fed asinputsto 
an external artificial neural network (ANN) ~ layers of interconnected computational units (circles) whose 
connections can be adjusted, allowing the network to be trained to perform tasks such as classifying images. 
An input layer of the ANN receives signals encoding simple physical elements (represented here by dotsand 
lines); in subsequent layers, these are optimized to mid-level features (simple shapes); and refined images 
are formed at the output layer (3D shapes). The overall response can be slow and energy-hungry. b, Mennel 
etal.’ reportasystem in which interconnected sensors (squares) on achip not only collectsignals, but also 
Work as an ANN o recognize simple features, reducing movement of redundant data between sensorsand 


external circuits. 


absorption in the thin semiconductor and 
to increase the range of light intensities that 
can be detected, Furthermore, the reported 
design requires high voltages and consumes 
alotof power; by comparison, the energy con- 
sumption per operation in abiological neural 
networkisatthesub-femtojoulelevel (10- to 
10" joules)*. ttwould also beuseful to expand 
the response to ultraviolet and infrared light, 
to capture information unavailable in the 
visible spectrum’. 

The thin semiconductors used are difficult 
toproduce uniformly over large areas, andare 
hard to processsothatthey canbeintegrated 
with silicon electronics, such as external cir- 
cuits used for readout or feedback control, The 
speed and energy efficiency of devices that 
use these sensorswill bedominated notby the 
image-capturing process, but by data move- 
ment between sensors and external circuits. 
Moreover, although the computing-in-sensor 
unitcollectsand computes datain the analog 
domain, reducing analog-to-digital conver- 
sions, the peripheral circuits still suffer from 
other intrinsic delays. The sensors and exter- 
nal circuits will need to be co-developed to 
decrease the latency of the entire system. 

Mennel and colleagues’ computing- 
in-sensor system should inspire further 
research into artificial-intelligence (Al) 
hardware. Afew companies have developed Al 
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vision chips based on silicon electronics*, but 
the chips’ intrinsic digital architecture leads 
to problems of latency and power efficiency. 

More broadly, the authors’ strategy isnot 
limited to visual systems. It could be extended 
to other physical inputs for auditory, tactile, 
thermal or olfactory sensing” ". Development 
of such intelligentsystems, together with the 
arrival of the SG fast wireless network, should 
allow real-time edge (low-latency) computing 
inthe future. 
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Microbiology 


One genetorulethemall 
inachronic braininfection 


Eva-Maria Frickel 


Agene has been found that controls the conversion of the 
parasite Toxoplasma gondii into a form that chronically 
infects the human brain. The discovery could aid the design 
of therapies to eliminate this currently untreatable infection. 


Icis estimated that around one-third of the 
global human population’ is infected with 
the single-celled organism Toxoplasma 
gondii, a parasite that can be ingested in food 
or picked up from activities suchas garden- 
ing’. The parasiteneeds to differentiateintoa 
chronic-stage form to establish a permanent 
infection in brain and muscle tissue, but how 
this parasite conversion occurs has been a 
mystery. Writing in Cell, Waldmanetal." report 
the identification of a gene that encodes a 
master regulator of this differentiation event. 

Toxoplasma gondii can infect any warm- 
blooded animal. Human infection can occur 
through eating undercooked meat from 
infected livestock or by ingesting contami- 
nated food or water. Withina couple of weeks 
ofentering itshost, 7. gondifis converted from 
aform called an acute-stage tachyzoite into 
abradyzoite, which establishes a chronic 
infection (Fig. 1). A bradyzoite formsa cyst 
that resides in host cells andis surrounded by 
a thick wall of proteins and sugars. The wall 
isa formidable barrier that makes the cyst 
inaccessible and thwarts its elimination by 
drugs or the host's immune system. 

Although T. gondii infection is widespread 
in human populations, itis often harmless, 
being in the relatively quiescent state of 
bradyzoites that havenot reverted to theacti- 
vated tachyzoite formassociated with disease. 
However, I. gondiiinfection canbelife-threat- 
ening for unborn fetuses or for people whose 
immune systemsare compromised, Moreover, 
inthe United States, 2% of f. gonditinfections 
resultin sight problems or blindness owing to 
ocular damage caused by treatment-resistant 
parasites*, 

To uncover the signal that controls the 
formation of bradyzoites, Waldman and 
colleagues engineered T. gondii to express 
a green fluorescent protein if such cysts 
formed. Monitoring the fluorescent protein 
using microscopy and cell-sorting technol- 
ogies offered a way of assessing whether 
the parasite had differentiated into the 
form associated with chronic infection. 
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Exposure to stress-inducing treatment 
in culture conditions, such as an alkaline 
pH, made the parasite differentiate into 
bradyzoites. Theauthorsusedthegene-editing 
tool CRISPR to disruptselected genesto assess 


“The parasite needsto 
differentiate intoa 
chronic-stage form to 
establish a permanent 


fection: 


whether any of them affected differenti 

tion. The results were stunning and clear. 
The disruption of only one targeted gene — 
which the authors call bradyzoite-formation 


‘@ Blood vessel 


—Vacuole 


Vacuole and 
host call burst 


Acute-stage 
tachyzoite 


deficient 1 (BFD1) — prevented the formation 
of bradyzoites. 

BFDI encodes a transcription-factor 
protein belonging to a family known as 
Myb-domain-containing proteins. Waldman 
etal. demonstrate that the Myb domain of 
BFDI protein drives 7. gondii differentiation 
Thisisparticularly intriguingbecause another 
Myb-domain-containing protein controls 
chronic-stage cyst formation in the parasite 
Giardia lambia’, Moreover,a related member 
of the Myb-domain-containing protein family 
enables Plasmodium parasites to develop in 
red blood cells®. Inaddition to BFDI, T. gondii 
encodes 13 other Myb-domain-containing 
proteins. Identifying their functionsand deter- 
mining whether any of them aid theinfection 
process should bea priority. 

Waldman and co-workers report that 
T. gondii lacking BFD1 fail to establish a 
chronicinfectionin mice. When investigating 
theregulation of BFD1 expression, the authors 
made the counter-intuitive discovery that 
the messenger RNA that encodes BFD1 was 
expressed at a similar level during both the 
acuteand chronicstages of infection. The pres- 
ence of BFD1 is sufficientto drive parasite dif- 
ferentiation into a bradyzoite, and the mRNA 
encoding BFD1ispreferentially translated into 
protein duringthe chronicstage of infection. 
Leveraging this finding, Waldman etal. en; 
neered T. gondiito expressaform of BFDI that 
isunstable unless a specific compound isalso 
given. Consistent with the authors’ model, the 
compound-mediated stabilization of BFD1 
caused the parasite to form a bradyzoite. 


Blood-brain Brain tissue 
barrier 

a 

‘ 
Chronic-stage 
bradyzoite 

i 

Chronic stage 


FDI 


Gene neaded for 
bradyzoites 


Figure 1 | How Toxoplasma gondii parasites differentiate to cause a chronicinfection. a, Toxoplasma 
‘gondii infects humans, and can be life-threatening, During the initial stages of infection, the parasite exists 
inthe bloodstream in form called an acute-stage tachyzoite, which isin a vacuole. tis taken up by a host 
cell (not shown) and the cell and vacuole subsequently burst. The parasite enters the brain and gives rise to 
a chronicinfection. Such infection occurs when the parasite differentiates into a form calleda chronic-stage 
bradyzoite. b, Waldman et al. report that the gene BDF1 is required for this differentiation step. In both the 
acute and chronic stages of infection, this geneis transcribed into messenger RNA. However, the encoded 
protein BFDLis preferentially made during chronic infection. BFDLisa transcription-factor protein that can 
drive the expression of genes needed for the formation of bradyzoites. 
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This discovery raises the question of how the 
translation of themRNA that encodes BFD1is 
regulated, possibly in response to stress, to 
trigger chronic infection. 

As expected, the authors observed that 
T. gondii parasites differentiated into 
bradyzoitesafter several rounds of replication 
inhostcellsinvitrounder stressful conditions 
(in vivo stress arises, in part, from the host's 
mounting immune response). This process 
was not synchronous across all parasites 
being cultured or even for those in one host 
cell. The researchers therefore used single-cell 
RNA profiling of wild-type and BFDI-deficient 
parasites to assess gene-expression profiles 
associated with the differentiation event. They 
also investigated the regions of the parasite 
genome to which BFDI binds. Gratifyingly, 
as expected for a transcription factor, BFD1 
bound to gene regions called transcription 
startsites, and, in particular, to thoseina large 
set of genes thatthe authors had identifiedas 
being expressed at higher than normal levels 
during differentiation. 

Many questions remain unanswered regard- 
ing BFD1's regulation of differentiation, and 
how it mightact upstream of a group of pre- 
viously identified transcription factors called 
ApiAP2s, which are important, but not suffi- 
cient, for differentiation’. Considering that 
BFD1is probably regulated by translational 
control, approaches that determine the RNA 
content of single cells might not be enough 
to identify the full cohort of factors driving 
differentiation. Another way to investigate 
translational controlis to profile RNAs bound 
tothetranslational machinery of the ribosome 
complex. This method has already beenused 
for T.gondii*”, and should beenlisted to study 
bradyzoites. 

Bradyzoites can now be maintained in host 
cells grown in vitro without adversely affecting 
the host cells, opening many vistas for future 
experiments. Particularly exciting is the pos- 
sibility of analysing bradyzoites during brain 
infection by usingan approach that harnesses 
stem-cell technologies, such as those that pro- 
duce neuronal stem cells. CRISPR provides a 
way of testing the role of host genes, and this, 
method can also target T. gondii both in vitro 
and in vivo", The availability of these tools 
sets the stage for new discoveries about the 
interplay between the parasite, host and 
immune system throughout the acute and 
chronicstages of infection, Thedevelopment 
ofartificial-intelligence methods that enable 
computer-driven assessments of complex 
and subtle differences in images of T. gondii 
offers another way of assessing the infection 
process®. 

Given thatbradyzoites are the most relevant 
and challenging stage of the T. gondiilife cycle 
totackle for the treatment of the human dis- 
ease, targeting BFDI shows real potential for 
making progressin the development of drugs 


or vaccines. The discovery of one gene that 
can rule them all moves us closer to solving 
the riddle of this chronic infection. 
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Aself-activating 
orphan receptor 


Brian Krumm & Bryan L. Roth 


The first 3D structure ofa full-length G-protein-coupled 
receptor whose natural activator is unknown has been 
determined, providing insights into an unusual mode of 
activation and a basis for discovering therapeutics. See p.152 


G-protein-coupled receptors are the largest 
class of membrane protein in the human 
genome, and represent the most abundant 
pharmaceutical targets. More than 800 such 
receptorsare knownin humans, of which per- 
haps 100 are orphan receptors ~ those for 
which the naturally occurring (endogenous) 
ligand molecules that bind to and activate 
them have yet to be identified". This lack of 
understanding of orphan G-protein-coupled 
receptors (oGPCRs) impedes our ability to 
exploit theirpotentialas therapeutic targets. 
On page 152, Linetal.’close this gap in know!- 
edge by reporting the first 3D structure ofa 
full-length oGPCR, GPRS2, in multiple states. 

GPR52isa potential drugtarget for treating 
several neuropsychiatric disorders, including 
Huntington's diseaseand schizophrenia. When 
activated, it selectively binds to the G, family 
of G proteins inside cells, and thereby stim- 
ulates the production of cyclic AMP (cAMP) 
signalling molecules, which regulate various 
cellular processes. Efforts to find drugs that 
target GPRS2 would benefit from a greater 
knowledge of how the receptor couples toG, 
and its activation process. 

Lin et al. began their investigation of the 
structural basis for GPRS2 activation using 
X-ray crystallography. In their initial stud- 
ies, the authors used a variety of strategies, 
including extensive protein engineering, to 
both stabilize the receptor and enable its 
production in sufficient quantities to pro- 
duce high-resolution crystal structures. The 
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Figure 1| Binding sites in the receptor GPRS2. Lin 
etal. reportstructures of the membrane receptor 
GPRS2, a potential drug target for which the 
putative naturally occurring agonist ~ the ligand 
molecule thatactivatesthe receptor ~ isunknown. 
‘The authors find thata region of GPRS2 known as 
extracellular loop 2 (ECL2) binds to asite in the 
receptor that is analogous to the agonist-binding 
site in other receptors from the same family. ECL2 
seems toactivate the receptor, removing theneed 
for an external agonist. The authorsalso find that 
the synthetic molecule c17, which activates GPRS2, 
binds toa different region next to the site bound 
by ECL2, and might therefore bean allosteric 
modulator (acompound that potentiates the 
activity of the receptor but does not bindat the 
agonist binding site).. 
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researchers thus obtained the structures of 
human GPRS2 in the ligand-free (apo) state 
andincomplexwith cl7,asynthetic molecule 
that acts as an agonist (that is, itactivates the 
receptor). 

Not unexpectedly, GPRS2-apo adopts 
the GPCR architecture that has been seen 
in many other structures, involving seven 
transmembrane domains. Surprisingly, a 
region of the receptor knownas extracellular 
loop 2 (ECL2) folds into what would normally 
be the binding site for an endogenous ligand 
(the orthosteric binding site), where it acts 
as alld that blocks the entrance to this site 
(Fig.1). Lin etal. observed that the activity of 
GPRS2is significantly diminished when ECL2 
is mutated or deleted, indicating that the 
loop is essential for signalling activity in the 
receptor’s native environment. Meanwhile, 
the crystal structure of the receptor in com- 
plex with cl7 suggests that this agonist binds 
toa’side pocket’ that has not been abserved 
in previously reported structures of GPCRs. 
Theauthors therefore speculate thatcl7 acts 
allosterically —atasiteremote from the ortho 
steric binding site — to potentiate GPRS2’s 
activity. 

Remarkably, the authors were then able to 
forma stable complex of GPRS2 witha modi- 
fied G, proteinin the absence of anagonist, and 
to obtain the structure of the complex using 
cryo-electron microscopy. The receptor in 
this complex has the structural hallmarks of 
previously visualized, active GPCRs captured 
incomplex withG proteins’, Thearrangement 
of ECL2 in this active-state structure is the 
same as in the crystal structure of GPRS2- 
apo, implying that ECL2 acts asa ‘tethered 
agonist’ under physiological conditions to 
facilitate signalling pathways in the absence 
of an endogenous agonist ~ similarly to the 
behaviour of some other GPCRs, such as 
the PARI protease-activated receptor", 

Most GPCRs have some basal (constitutive) 
activity wherein they spontaneously couple 
to their particular G proteins. The consti- 
tutive activity of GPRS2 is exceptionally 
high’. Indeed, Lin and colleagues find that 
GPRS2’s basal activity is so great that the 
receptor's ability to signal by increasing 
cAMP levels is only slightly augmented by 
the addition of cI7. 

The authors report that this high level of 
constitutiveactivityisachievedbyatleasttwo 
structural features thatare unusualfor GPCRs: 
the lack of a binding site for sodiumions, and 
the occupation of an apparentagonist-binding 
site by the tethered agonistin ECL2. The sodi- 
um-binding site of GPCRsisknown tobeimpor- 
tantfor damping constitutive activity’, and so 
the observation that a GPCR that lacks such 
asite has a high level of basal activity is not 
entirely surprising. By contrast, the discovery 
ofa tethered agonist that helps to maintain 
GPRS2 in the active state in the absence of 
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an external agonistis truly striking. The new 
findings raise the intriguing possibility that, 
foratleastsomeoGPCRs, theincorporationof 
agonists within thereceptor itself obviatesthe 
need for external ligands. Indeed, several other 
oGPCRs that havehigh constitutive activities* 
have been identified, along with others that 
don'thavesodium-bindingsites*. 

Itshould be kept in mind that —as with all 
structural studies ~ Linand colleagues’ work 
has provided only a few snapshots of the 
receptor structure. Further biochemical and 
biophysical studies will be essential to work 
outthe details of GPRS2’sdynamicbehaviour 
under physiological conditions. 

Nevertheless, the authors’ high-resolution 
structures should aid the development of 
drugs thatselectively target GPR52, but avoid 
other potential drug targets — for instance, 
by enabling computational studies’ in which 
ultra-large libraries of potential ligands are 
docked into the binding site revealed by the 


Ecology 


structures. Moreover, if the approaches used 
by Lin e¢ al. for the structural elucidation 
of GPRS2 are applied to other oGPCRs that 
have high constitutive activity®, they might 
transform our understanding of oGPCRs and 
accelerate their therapeutic exploitation. 
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Biodiversity 


theory 


backed by island bird data 


Kostas A. Triantis & Thomas J. Matthews 


Analysis of a unique global data set reveals how the species 


diversity of bird: 


affected by the properties of archipelagos 


and offers a way to test an influential theory. Has this improved 
our understanding of island biodiversity patterns? See p.92 


The thousands of islands in the Aegean Sea 
between Greece and Turkey have inspired 
countless myths and works of literature. This 
region is also where the word archipelago, 
which meansa group of islands, has its roots. 
Archipelagos and their constituent islands 
have long been viewed as natural ‘laboratories’ 
for developing and testing theories that aim 
toanswer key questions aboutbiodiversity"*. 
Onpage 92, Valente etal.® report an impressive 
analysis of birds on archipelagos worldwide 
that provides some of these long-awaited 
answers. 

In the1960s, the biologists R.H, MacArthur 
and E, 0. Wilson proposed the theory ofisland 
biogeography’®, which iscommonly used to 
explain observed patterns of species richness 
(the number of different species) on islands. 
This development marked the dawning of a 
renaissance for biogeography (the study of 
species distributions over space and time) that 
advanced this field froma largely descriptive 
endeavour toa quantitative and predictive 
science’ *. 

The theory of island biogeography was 
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inspired by two well-established patterns of 
species diversity. One pattern is that species 
richnessincreases if agreaterareais sampled. 
The other pattern is that the species richness 
of anislandislower the greater the isolation of 
theisland ~ the fartheraway theislandisfroma 
potential source of species, such as theclosest 
mainland. The theory ofisland biogeography 
predictsthat thespecies richness observed on 
anislandis the result of theinterplay between 
three fundamental processes ~ extinction, col- 
onization (the dispersalandestablishment of 
species from the continental landmass to an 
island) and speciation (the generation of new 
species) ~ and that these processes depend 
onisland area and isolation. This theory has 
had awide-reachinginfluence on researchers 
in fields including ecology and conser- 
vation biology, and has underpinned the 
emergence of subdisciplines in these fields, 
such as macroecology and metapopulation 
biology". 

Yet despite a multitude of studies’ testing 
the theory of island biogeography, few have 
sought to use molecular phylogenies to 


directly test on aglobal scale the dependency 
of extinction, colonization and speciation on 
island area and isolation. Valente and col- 
leagues provide such a test. They focused 
on terrestrial birds, excluding migratory 
species, and gathered an impressive data 
set of 491 species across 41 archipelagos 
worldwide. 

Building on their previous work inves- 
tigating mechanisms that generate island 
biodiversity’, the authors applied an inno- 
vative modelling approach that combined 
molecular phylogenetic data with informa- 
tion on the spatial distribution of birds. The 
authors obtained genetic data from 90 species 
across different archipelagos, including 110 
island populations not previously sampled. 
Valente and colleagues also sampled genetic 
data for the closest mainland-dwelling rela~ 
tives of several of these island species. After 
combining their data with pre-existing data, 
the authors built phylogenetic trees show- 
ing the evolutionary relationships between 
species. Using these phylogenies, they were 
able to estimate colonization, extinction and 
speciation rates. The authors also included 
species known to have been driven to extinc- 
tion by humans, because excluding such 
species impedes our understandingof natural 
processes and biodiversity patterns”, 

The authors’ models, which used rates 
estimated at the archipelago level, have high 
explanatory power and confirm several key 
predictions of the theory of island biogeo- 
graphy —namely, thatextinction rates decline 
with increasingislandarea, colonization rates 
decline with increasing distance from the 
island to the continent, and speciation rates 
increasewith the areaand isolation ofislands. 
The authors studied two types of speciation 
(Fig. 1) separately: anagenesis (in which a 
new species arises when an island popu- 
lation diverges from its ancestral species on 
thecontinent to become different species") 
and cladogenesis (in which an ancestral spe- 
cies splitsinto two or more different species’). 
They found that anagenesis increases with 
island isolation, and cladogenesis increases 
on larger, more isolated islands. These find- 
ings will help future studies that attempt to 
answer long-debated questions, such as why 
only certainanimaland plant groupsspeciate 
extensively, and whether there are upper limits 
tothe speciesrichnessand speciation rates in 
specific regions of the globe’. 

Valente and colleagues have not only 
advanced our understanding of the laws gov- 
cerning species richness on islands, they have 
also confirmed several predictions of the 
theory of island biogeography. Asthe authors 
mention, the next step will be to apply their 
analytical framework to other island-dwell- 
ing species, particularly those, such as snails 
or reptiles, that have less ability to disperse 
than birds do. These analyses could be further 


Figure 1 |Bird biodiversity. The theory of island biogeography”, proposed in the 1960s, isamilestonein 
our understanding of how biodiversity is established and maintained. Valente et al tested this theory on a 
global scale using data for istand-dwellingbirds from 41 archipelagos. Their results confirm key predictions 
ofthis theory.a, The authors report that two-thirds of the birds native to archipelagos arose froma process 
of species formation called anagenesis, which typically occurs on isolated islands (those far from the 
maintand). Anagenesis has given rise to birds such as the Bolle’s pigeon (Columba bollit) of the Canary 
Islands. b, Another process of species formation, called cladogenesis, is most common on large, isolated 
islands. The authors report that of the birds they studied, a group called Hawaiian honeycreepers had the 
greatest number of species (33 in total) that arose by cladogenesis. One example of such species is Hawail’s 


iiwi, or scarlet honeycreeper (Drepanis coccinea). 


informed by incorporating intothis approach 
species’ functional traits", such as body size 
and diet. 

The implications of Valenteand colleagues’ 
results extend beyond the field of island bio- 
geography. For example, characterization 
of the relationship between island area and 
extinction rate contributes to the discussion 
in conservation science about how to assess 
the effects on biodiversity of habitat loss and 
fragmentation during the Anthropocene 
(the name proposed for the current phase of 
planetary history, in which humanactivity has 
a dominant influence on the environment). 
This is relevant to today’s world, in which 
natural habitats are becoming increasingly 
isolated”. 

‘An important aspect of Valente and col- 
leagues’ study is theirapproach of considering 
anarchipelagoas a unit, rather than focusing 
on individual islands. This aligns with the 
idea that archipelagos might be the most 
appropriate units in which to frame analyses 
of biodiversity at large spatial and temporal 
scales. Analysis of large spatial units in bio- 
‘geography is not a new approach; however, 
these units generally take the form of geo- 
metric shapes, such asgrid squares, that do 
notdirectly correspond to ecological bound- 
aries (for example, those defined by vegetation 
type) and their associated communities. By 
contrast, archipelagos represent natural units, 
Itis likely that substantial strides will be made 
in our understanding of island biogeography 
from further analyses of ecological patterns 
and processes undertaken at the archipelago 
scale, especially if geological dynamics are 
incorporated. To paraphrase E. 0. Wilson™:it 
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archipelagos that are “the logical laboratories 
of biogeography and evolution’. 
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Tropical carbon sinks 


are out of sync 


AnjaRammig 


Asurvey of tree establishment, growth and mortality shows 
that the rate at which Amazonian tropical forests take up 
carbon dioxide has slowed since the 1990s, whereas signs of a 
potential slowdown in Africa appeared only in 2010. See p.80 


The total area of the world thatis covered by 
tropical forestis declining because of deforest- 
ation, land degradation and fires ~ atrendthat 
has increased over the past few years'. At the 
same time, human-induced climate changeis 
altering the functioning of tropical forests’. 
During the 1990s and early 2000s, structur- 
ally intact tropical forests actively removed 
carbon from the atmosphere (in the form of 
carbon dioxide) through photosynthesis, and 
storeditas biomass. Such forests have been 
responsible for about $0% of the terrestrial 


carbon sink®, Hubau er al.* report on page 80 
that this globally crucial tropical carbon sink 
is becoming saturatedin both Amazonian and 
African rainforests, butwith differentpatterns 
of change. 

Forests act as a net carbon sink when the 
amount of carbon gained through the estab- 
lishmentofnew treesand tree growthislarger 
than the amount lost through tree mortality. 
Inthese circumstances, the quantity of carbon 
stored inthe biomassincreases over time. The 
interplay of carbon gains, losses and stocks 


determines the period of time for which car- 
bon remains in the forest, which is known as 
the carbon residence time’. 

Hubau and colleagues monitored treeestab- 
lishment, growth and mortality in 244 undis- 
turbed old-growth forest plots in Africa 
across 11 countries, between 1968 and 2015, 
and compared their data with similar meas- 
urements from 321 plots in Amazonia®. Such 
long-term monitoring is essential for identi- 
fying trends and drivers of the carbon sink in 
forest biomass, butis highly challenging and 
costly in terms of coordination, labour and 
funding — particularly in the tropics, where 
access to field sites is difficult and working 
conditionsareharsh (Fig. 1). Theauthors find 
that the carbon sinkin African tropical-forest 
biomass was stable for the 30 years up to2015, 
in contrast to the sink in Amazonian tropical 
forests, for which the annual net amount of 
accumulated carbon started to decline around 
1990 (Fig. 2). So what drives the slowdown of 
the tropical carbon sink, and why are there 
differences between Amazonian and African 
tropical forests? 

The authors report a long-term trend of 
increasing carbon gainsin the forests on both 
continents throughout the period studied, 
which correlates with the increase in atmos- 
pheric CO, concentrations. They attributethe 


Figure 1| Taking an inventory of the Amazon rainforest. researcher takes measurements of tree trunk at a height of 2metres above the ground. Long-term 
‘monitoring suchas this can be used to estimate the amount of carbon stored by tropical forests. 
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Figure 2| Estimates and projections of tropical carbon sinks. Hubau er al." have estimated the netamount 
of carbon that was absorbed from the atmosphere by tropical forests ~ the tropical carbon sinks ~ in 

Africa and Amazonia for the period from 1968 to 2015, using measurements of tree establishment, growth 
and mortality; only estimates from 1990 onwards are shown. The data show that the sink in Amazonia has 
declined since the 1990s, whereas the African sink was stable for the 30 years up to 2015. The authors also 
estimated the carbon sinks using statistical models, which they extrapolated to 2040, The extrapolations 
suggest that, by 2030, the carbon sink in Africa will be 14% lower than in 2010-15, whereas the Amazonian 
carbon sink will reach zero by 2035, Data shown are mean values; see Fig. 3 of ref. for confidence intervals. 


rising gains to CO, fertilization ~ an increase 
in carbon uptake by plants that occurs as 
atmospheric CO, levels rise. However, they 
find that increasing mean annual tempera- 
tures and drought since 2000 have reduced 
tree growth and thus offset the increase in 
carbon gains, with smaller reductionsin Africa 
than in Amazonia. 

Hubau eral. goon to show thathigh carbon 
gains persisted for longer in Africa than in 
Amazonia because the warming rate was 
slower, there were fewer droughts and air 
temperatures were generally lower (because 
African forests are located at higher eleva- 
tions). And, in contrast to an earlier study’, 
the authors were able to clearly attribute the 
decline of carbon gainsin Amazoniatoincreas- 
ing temperatures and repeated extreme 
drought events, on the basis of a statistical 
analysis of their data. The researchers find no 
signs of the CO,-fertilization effect levelling 
offoneither continent. 

Although the authors attribute the decline 
incarbon gains on both continents to climatic 
drivers, other limiting factors mightberespon- 
sible ~ suchas competitionbetween trees for 
lightand nutrients, and the general availability 
of nutrients on each continent. These factors 
were not considered in their statistical analy- 
sis, but might further constrain tree growth 
and weaken the sink as atmospheric CO, con- 
centrations continue to increase. Such limita- 
tionshavebeen hinted at from experimentsin 
which the atmospheric concentration of CO, 
isenrichedinaspecificareaofan ecosystem’, 
butno such experiment has been carried out 
in highly diverse, old-growth tropical forests 
suchas those in Africa and Amazonia. 

In addition to the trends in carbon gains, 
Hubau etal. find that carbon losses in Africa 


werestable from the 1990s untila decade ago, 
and then started to increase. By comparison, 
carbon losses in Amazonia had already started 
to increasein the 1990s. This continental dif- 
ferenceseemsto be becausetreesin Amazonia 
grow faster and have shorter carbon residence 
times than do those in African forests. Carbon 
dioxide fertilization might increase growth 
rate and carbon gains, but it also leads to 
quicker losses — CO,;fertilized trees grow 
fast and die young®, and therefore might not 
necessarily contribute to the carbon sink in 
thelongterm. The authors find thattree mor- 
tality associated with chronic long-term heat 


“The authors estimate 
that the Amazonian 
carbon sink willreach 
zero by 2035.” 


and droughtleadstoincreased carbon losses, 
and that this effect is more pronounced in 
Amazonian than in African tropical forestsasa 
result ofaccelerated warming ratesin Amazo- 
niasince 2000. Data from the mostintensively 
monitored African plots indicate that carbon 
losses in those forests began increasing from 
about 2010. 

The authors extrapolate their statistical 
models up to the year 2040, and thereby sug- 
gest that the carbon sink will decline on both 
continents. They estimate that, by 2030, the 
carbon sink in Africa will be 14% lower than 
in 2010-15, whereas the Amazonian carbon 
sink will reach zero by 2035 (that is, there 
will benonet carbon uptake from the atmos- 
phere). These extrapolationsneedto be inter- 
preted carefully, however, because they are 
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in striking contrast to projections made by 
global models ~ which predict a strong, con- 
tinuing carbon sink due to CO; fertilization 
in intact tropical forests*. Recently reported 
models’ of vegetation growth that consider 
nutrient cyclingshow thatthe Amazonian-for- 
est carbon sinkis strongly constrained by the 
availability of phosphorus in soils. Hubau 
and colleagues’ findings underline the need 
to understand other factors that affect tree 
mortality and forest dynamics, in addition to 
such nutrient feedbacks, so that these can be 
integrated into global models”. 

So, what does a pan-tropical decline 
of the carbon sink in intact forests imply 
for the current climate crisis? Calculations 
of the maximum amount of anthropogenic 
carbon emissions that canbe emitted to limit 
global warming to well below 2°C — the goal 
of the 2015 Paris climate agreement — count 
on the continuation ofalarge tropical carbon 
sink”, Hubau and co-workers’ finding that 
tropical sinks are disappearing and could very 
soonturninto carbonsources suggests hat, as 
wellas strong protection of intact tropical for- 
est, even faster reductions of anthropogenic 
greenhouse-gas emissions than those set out 
in the agreement will be needed to prevent 
catastrophic climate changes. 


Anja Rammig is at the Technical University 
of Munich, TUM School of Life Sciences, 
Weihenstephan, Freising 85354, Germany. 
e-mail: anja.rammig@tum.de 


Song, X-P. etal. Nature §60, 629-643 (2018). 
Trumbore, &., Brando, P. & Hartmann, H. Science 349, 
81a-818 2016). 

4. Pan, ¥. etal. Science 333, 988-93 (201) 

44. Hubau, W. et al. Nature 679, 80-87 (2020). 

5. Kemer, C. Science 385, 130-131 (2017). 

6. Brienen,. J. W.et al Nature 519, 344-248 (2018). 

7 

a 

8 


Norby, &-et al. New Phytol. 209, 17-28 (206), 
Huntingford, C. etal Nature Geosci 6, 268-275 (2013). 
Fleischer, K. et al Nature Geosci. 1, 736-721 (2019). 

10. Steffen, W. tal Proc. Natl Acad. Scl. USA NS, 8252-8259 
(2018), 


Nature | Vol579 | 5 March 2020 | 39 


Review 


Single-particle spectroscopy for functional 


nanomaterials 


https://doi.org/10.1038/s41586-020-2048-8 
Received: 26 February 2019 


Jiajia Zhou™, Alexey I. Chizhik™, Steven Chu 


& Dayong Jin’ 


Accepted: 7 January 2020 
Published online: 4 March 2020 
® Check for updates 


Tremendous progress in nanotechnology has enabled advances in the use of 
luminescent nanomaterials in imaging, sensing and photonic devices. This 
translational process relies on controlling the photophysical properties of the 


building block, thatis, single luminescent nanoparticles. In this Review, we highlight 
the importance of single-particle spectroscopy in revealing the diverse optical 
properties and functionalities of nanomaterials, and compareit with ensemble 
fluorescence spectroscopy. The information provided by this technique has guided 
materials science in tailoring the synthesis of nanomaterials to achieve optical 
uniformity and to develop novel applications. We discuss the opportunities and 
challenges that arise from pushing the resolution limit, integrating measurementand 
manipulation modalities, and establishing the relationship between the structure and 
functionality of single nanoparticles. 


As was pointed out by Richard Feynman in his classic lecture ‘There’s 
plenty of roomat the bottom’, the manipulation of atoms or mole- 
cules will allow what we now call nanostructures to be tailored with 
accuracy and with desirable physical and chemical properties. This 
requires establishing the relationship between structureand proper- 
ties for each nanostructure, and observing them withsub-nanometre 
resolution. Although there is till along way to gotoachieve Feynman's 
vision, there has been tremendous progress in nanotechnology and 
characterization techniques towards realizing the possibilities that 
he conceived decades ago*”. 

Among the breakthroughs that have stimulated the growth of 
nanotechnology is the controlled synthesis and characterization of 
luminescent nanoparticles* . The use of single-photon detectors has 
boosted the development of bright and photostable nanoparticles 
with tunableemission properties, which have attracted great attention 
from multiple disciplines”. Super-resolution microscopy methods 
that can resolve single nanoparticles beyond the diffraction limithave 
become another trigger of progressin nanotechnology”. 

Single luminescent nanoparticles, although synthesized from the 
samebatch, can often be heterogeneousin termsof size, shape, defects, 
surface groups and charges. These are core issues in fundamental 
research related to material science, crystallology andinterfacial chem- 
istry, andare crucial for reproducibility, Functionality and applications. 
Electron microscopy isan essential tool for observing the structure of 
individual nanoparticles, butit barely accesses their optical functions. 
Conventional fluorescence spectroscopy can show certain signs of 
heterogeneity in nanoparticle properties—for instance, ensemble-spec 
trum broadening or multi-exponentiality of fluorescence decay—but 
theseare derived by ensemble averaging. Single-particle spectroscopy 
(SPS) isa rapidly developing class of techniques thatenable thediscern- 
ment of individual features of single particles and thus to provide direct 
information on their heterogeneity. By measuring the heterogeneity 


of the optical properties ofa single particle, itis possible toanalyse the 
influence of particle size, shape, surface state, composition, geometric 
orientation and the local environment. 

Advancesin material synthesis methods and SPS enhance the quality 
ofnewtypes of nanoparticleand the homogeneity of their optical prop- 
erties, However, the more we pursue perfection in nanoparticle design, 
the greater the challengesbecome. With the growth of the complexity 
of material systems and characterization techniques, future advances 
inluminescent nanomaterials will be built oninterdisciplinary integra- 
tion of materials sciences, optical imaging and spectroscopy beyond 
the diffraction limit. This Review focuses on inorganic luminescent 
nanoparticles, which have been the subject of many studies". Wefirst 
present new insights thathave been gained through PS techniques. We 
show the advantages of controlled synthesis towards achieving optical 
uniformity and field-responsive properties of single nanoparticles, 
which can be tailored for diverse applications. We discuss the current 
opportunities and challenges in advancing next-generation SPS. 


Luminescent nanoparticles under SPS 

Since the first demonstration of single-molecule measurements**, 
rapid progress in optical microscopy has made it possible to routinely 
see the fluorescence of individual emitters using highly sensitive 
photodetectors. Advances in single-photon avalanche diodes and 
electron-multiplied charge-coupled device cameras with high quantum 
efficiency and low noise have enabled the wide application of single- 
molecule fluorescence spectroscopy and imaging in commercial and 
custom-built microscopy systems. For more than a decade, SPS has 
advanced understanding of the heterogeneity of nanoparticles and 
facilitated discoveries of the underpinning photophysics. Correla- 
tive microscopic SPS methods have been developed in both a passive 
detection manner, suchas in situ atomic-force microscopy (AFM) or 
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Fig.1|Correlative methods providing deterministicinformation abouta 
singlenanoparticle.a, In situ AFM integrated with optical microscopy (top) 

reveals the particle amount, geometry and two-dimensional (2D) orientation 
(bottom). b, TEM/SEM/STEM correlation (top) confirmsthe particle amount 

and composition (bottom). ¢, Optical tweezers (top) manipulate the three- 


exsitu transmission and scanning electron microscopy (TEMand SEM, 
respectively), and a positive manipulation manner, such as the use 
ofan AFM tip or optical tweezers (Fig. 1). The optical heterogeneity 
information provided by SPS helps to guide the synthesis of uniform 
nanoparticles and improve their performance in each application 
(Fig. 2). In the following, we discuss insights obtained from SPS for 
different nanoparticles, and exemplify several situations in which the 
fundamental optical properties of nanoparticles—uniform or hetero- 
geneous—makea differencein applications. 


Quantum dots 

Quantum dots are artificial atoms with discrete, atomic-like energy 
levels and a spectrum of narrow transitions. However, populations 
of different sizes and shapes in an ensemble state induce inhomoge- 
neous spectral shifts and deformation. SPS has been used to decode 
the discrete nature of excited states of quantum dots since 1996". 
In the same year, fluorescence intermittency (blinking) from single 
CdSe quantum dots was revealed, in which the intensity jumped 
betweenon-and off-states under continuous excitation”. Later, itwas 
shown that blinking of quantum dots can have different characteris- 
tics, The commonly observed (so-called A-type) correlated blinking of 
fluorescence intensity and modulation of excited-state lifetimeare due 
tochargingand discharging of the nanocrystal core. InB-type blinking, 
changes in emission intensity are not accompanied by modulation in 
emission dynamics. In thick-shell CdSe/CdS quantum dots, another 
type of blinking was observed, in which fluctuations of the excited- 
state lifetime were accompanied by stable emission intensity”. Studies 
of single-particle fluorescence blinking led to advanced understand- 
ing of charge recombination and trapping processes in quantum 
dots and enabled the design of quantum dots with reduced 
blinking". 

‘SPS has revealed the relationship between emission polarization of 
single quantum dots and their aspect ratio”, The dependence of the 
polarization on the particle shape and orientation of single CdSe/CdS 
nanorods has been characterized using correlated AFM and optical 
microscopy (Fig. 1a)". In these experiments, the orientation angles 
of a single nanorod were identified using AFM as 5°, 125° and 135°, in 
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dimensional orientation of asingle nanoparticle (bottom) by rotating the 
polarization of the trapping laser.d, Super-resolution microscopy (top) 
resolves the composition and transition dipole orientation of the 
nanoparticles, in addition to the particleamount anditstwo-dimensional 
orientation (bottom). 


good agreement with the emission polarization angles of 7°, 124° and 
133°, respectively. 

Correlated AFMsingle-particle fluorescence microscopy confirmed 
a suboptimal ensemble-fluorescence quantum yield of quantum dots 
caused by non-emissive particles”. A correlative study of atomic 
structure, chemical composition and time-resolved single-photon 
counting of the same single quantum dots revealed inhomogeneity 
in the quantum yields of single nonblinking giant CdSe/CdS quantum 
dots®. In particular, high-angle annular dark-field detection in scan- 
ning transmission electron microscopy (STEM) (Fig. 1b), paired with 
energy-dispersive X-ray spectroscopy, has been used to visualize the 
distance between the coreand the surface ofa CdSe/CdS architecture 
for spectroscopic correlation”. 

The study of heterogeneity has motivated the use of controlled 
growth to make optically uniform quantum dots, which has a key 
role in achieving specific performance indicators for applications. 
For instance, Fan and co-workers™ reported that uniform biaxially 
strained quantum dotsled toatwo-fold reduction of the single-particle 
linewidth compared with hydrostatically strained quantum dots; the 
former can generate continuous-wave lasing with a low threshold of 
8.4 kW cm” (Fig. 2a), whereas the latter can only generate pulsed-laser 
emissions witha threshold more than two times higher. 


Fluorescent nanodiamonds 
Fluorescent nanodiamonds that contain colour centres, suchas nitro- 
gen-vacancy centres, have been used as single-photon sourcessince the 
early 2000s*. SPSisan essential tool in photonantibunching, which isa 
standard technique for verification of fluorescence froma single quan- 
tum emitter. The method exploits the fact that a single emitter emits 
only one photon upon transition from the excitedto the ground state. 
SPS has provided many insightsinto the photostability of nanodia- 
monds when reducing their size. In 2010, a discrete S-nm nanodiamond 
was found to blink and bleach over several hours of illumination”. 
In 2014, the silicon vacancy in a1.6-nm nanodiamond” was reported 
to be chemically stable, but to blink and be optically unstable, with 
fluorescence lasting only for tens of minutes. Presumably owing to 
their surface defects, smaller nanodiamonds usually containa smaller 
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Fig.2|Opticaluniformity of nanoparticles advances biologicaland 
nanophotonics applications. a, Left, biaxially strained quantum dots 
exhibiting narrowed emission linewidthata single-particle level, facilitating 
low-threshold laser generation. Middle, normalized integration of the signal 
emitted froma quantum dot laser asa function of peak power. Right, emission 
spectra above and below the lasing threshold, corresponding to the points 
indicated by thearrowsin the middle panel, a.u., arbitrary units. Imageadapted 
from ref.™ (Springer Nature). b, Left, use ofasingle nanodiamond for reliable 
transport tracking, owing to itssuperior photostability. Middle, the 
trajectories of two single fluorescent nanodiamonds. Right, go 
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fraction of active emitters and their emission is not uniform in inten- 
sity. Current approaches to fabricating nanodiamonds have limited 
controllability in particlesize compared to the wet-chemistry synthesis 
methods used for other luminescent nanoparticles. There is a need 
for asynthesis approach that can produce both morphologically and 
optically uniform nanodiamonds. 

Photostable nanoparticles are useful for long-term tracking with 
high temporal resolution. The single-particle tracking technique 
can be used to detect minor abnormalities, reveal single-molecule 
biophysical mechanisms and diagnose diseases at an early stage™”. 
Haziza and co-workers®* developed a quantitative assay in mouse hip- 
pocampal neurons by tracking single 30-nm nanodiamonds (Fig. 2b). 
Theyachieved a temporal resolution of SO ms and suggested that this 
assay was sufficiently sensitive to detect achange of protein concentra- 
tion (-30%). Ina control experiment, they demonstrated that random 
blinking quantum dots couldaffect the trajectory reconstruction (for 
example, owing tothe existence of dark periodsand inferred segments), 
leading to biased tracking parameters. 


(nanodiamond 2) and stop (static) phases are shown in grey. Image adapted 
from ref. *(Springer Nature). , Left, uniform-brightness single UCNPS 
enabling the digitalassay of biomarkers. Middle, upconversion microscopy 
images of dilutions of prostate specific antigen (PSA) in 25% serum. Right, 
detection limit of 1.2 pg mt" inthe digital readout. Error barsindicatethe 
standard deviation from three replicate wells. Image adapted with permission 
from ref. ** (copyright 2017 American Chemical Society).d, Excited-state 
lifetime measurement of tunable UCNPs enabling time-domain optical 
multiplexing ap plications, including data storage and anti-counterfeiting 
(right). The middle panel shows the lifetime distributions of aseries of UCNPs. 
Image adapted fromref. “(Springer Nature). 


Upconversion nanoparticles 
Upconversion nanoparticles (UCNPs) are typically (co-)doped by 
lanthanide ions and represent an emerging type of nonlinear optical 
material that absorbs low-energy near-infrared photons to produce 
high-energy emissions in the visible and ultraviolet regions. The sin- 
gle-particle nature of UCNPS is usually confirmed by the correlation 
between AFM/TEM/SEM (Fig. 1a, b) images and confocal/wide-field 
fluorescence images” “Single UCNPs are photostable for hours", 
Quantitative measurements of the brightness of individual nanoparti- 
cles have revealed that inert-shell passivation could prevent the‘short- 
circuit’ effect of the migration of sensitized photon energy to surface 
quenchers®. Measurements of the fine-splitting emission spectra in 
individual 8-nm UCNPs have indicated possible heterogeneity of their 
surface conditions or disorderin the distribution of lanthanides within 
the nanoparticle“, 

Because of theirnonlinear nature, UCNPs behave very differently in 
ensemble spectroscopy and in SPS measurements". Ensemble-spec- 
troscopic studies have concluded thatoptimal doping concentrations 
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of Tm* and Er” should be smaller than1mol% and 2mol%in aB-NaYF, 
host, respectively, imited by the concentration quenching effect. How- 
ever, SPS has revealed that concentration quenchingis highly power- 
dependent”**. When the excitation power density reaches 10" W cm? 
or higher, single UCNPs highly doped with 8 mol% Tm” or 20 mol 
Er’ are orders of magnitude brighter than conventional low-doped 
UCNPs"**, Recently, a core-shell-shell design of single UCNPs highly 
doped with 8 mol% Er” ina NaYbF, host has been reported" to have 
high brightness, each emitting about 200 photons per second under 
alow excitation-power density of 8 Wem”. 

Compared with other fluorescent nanoparticles, UCNPs have bet- 
ter brightness uniformity. Uniform-intensity UCNPs have enabled an 
ultrasensitive quantification of biomolecules by directly countingthe 
number of single nanoparticles. This type of quantification isknown 
as the single-molecule (digital) upconversion-linked immunosorbent 
assay (digital ULISA) (Fig.2c)**. The quantification of single molecules 
is anticipated to be possible once one-to-one binding between the 
UCNP and the antibody can be realized. 

The excited-state lifetimeisa unique quantity witha relatively small 
deviation compared to other spectroscopic signals, such as emission 
intensity and peak ratios in UCNPs or other lanthanide-doped nano- 
particles. Lifetime uniformity allows one toachieveasingle-population 
distribution of lifetimes at both the single-nanoparticleand ensemble 
levels", facilitating accurate sensing by using the lifetime modality. 
Onthe basis of single-population characteristics, tunable lifetime con- 
trol isa promising tool for multiplexing applications such as imaging 
and data storage* (Fig. 2d). 


Carbon dots 

Carbon dots are anew type of carbon-based luminescent nanopar- 
ticle. Carbon dots consist of a carbon crystalline or amorphous core 
and various luminescent and non-luminescent surface groups, which 
provide a combination of unique or unexpected optical properties. 
A growing number of investigations aim to understand the lumines- 
cence mechanism of carbon dots. SPS has shown that whereas most 
of their luminescence properties largely resemble those of organic 
molecules”, the interaction of luminescent surface groups with acar- 
bon core and non-luminescent groups could determine the blinking 
character of carbon dots*™*. The introduction of specific functional 
groups tocarbon dots was found to not only affect their surface charge 
to display distinctive colours, but to also direct their selective label- 
ling. Negatively charged green carbon dots were localized within the 
endosomes/lysosomes while positively charged blue carbon dots were 
mostly located within the nucleus”. 


Lead halide perovskite nanoparticles 

Lead halide perovskite nanoparticles (LHPs) are colloidal nanocrystals, 
with an APbX,-type perovskite lattice, where A isa monovalent cation 
and X is ahalideion. In recent years, LHPs have attracted an explosive 
increase in research interest regarding their control synthesis, opti- 
cal characterization and optoelectronic-device applications”. SPS 
has enabled the comprehensive study of the optical performance of 
LHPs, including their blinking behaviour®, exciton dynamics® and 
single-photon emission***. 


Responses to external fields 

Apart from intrinsicelectronic transitions that enablenanomaterialsto 
display spectacular emissionsin different optical dimensions, excitons 
and electrons canalsorespond toastimulusinan external field, suchas 
temperature, magneticand electrical fields. Their diverse luminescent 
emissions can ‘dance’ with the external fields, which not only provides 
new angles to advance knowledge in their photophysical and material 
properties, but also enables the development of field-responsive sen- 
sors using single nanoparticles'*"***”, Statistical measurements of 
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optical signals from single nanoparticles can identify whether those 
from the same synthesis batch have synchronous response upon appli- 
cation of the stimulus. Asynchronous responses usually contributeto 
the study of the heterogeneity and branched properties of nanoparti- 
cles. Asynchronous response can give universal structure information 
about nanoparticles, demonstrating their uniformity. 


Cryogenicand vacuum conditions 

Cryogenicand vacuum conditions can be applied to SPS to reveal exci- 
ton behaviours*, which provide new insights for decoding emit- 
ting states thatare complex in atmospheric conditions. By inspecting 
120 single CdSe/CdS quantum dotsat 300 K in air, at 300 Kin vacuum 
and at 30 in vacuum, it was found that the nanocrystal was mostly in 
acharged state under vacuum, and oscillated between acharged state 
anda neutral state at 300 K in air, where non-radiative Auger recom- 
bination was thermally activated (Fig. 3a)". The correlated quantum 
yield and lifetime change suggested that the enhanced Auger recom- 
bination was due to electron localization on thenanoparticle surface. 
Low-temperature SPS has been used to analyse the excited-state 
lifetime, blinking and polarization of single LHPsto confirm the exist- 
ence of their bright triplet excitons®. 


Magnetic field 
Magnetic fields are used in SPS to characterize the Zeeman splitting 
and Rashba effect of the excited states of single nanoparticles****. 
For example, magneto-SPS has been used to record the evolution of 
the emission spectra of CsPbBr; LHPs as a function of magnetic-field 
amplitude’, By measuring17 nanoparticles, two subpopulations with 
different Zeeman splitting evolutions were identified. Among these 
nanoparticles, fivehad their zaxis nearly parallel tothe magneticfield, 
and the others did not display measurable Zeeman splitting because 
they were randomly orientated with respect to the magnetic field®. 
Cryogenic temperature conditions have usually been applied in 
magnetic-field SPS to minimize non-radiative interference. Very 
recently, Tamaratand co-workers* observed the direct spectroscopic 
signature of adarksinglet state from single LHPsina low-temperature 
(4K) magnetic field. By assessing 28 nanoparticles, the singlet state 
was measured to be several millielectronvolts lower than the bright 
tripletstate. When turning onthe magnetic field, they observedanextra 
redshifted peak and a slow componentin the lifetime curve that was, 
associated with the singlet state (Fig. 3b). twas the magnetic field that 
induced the dark-bright coupling, which enabled relaxation from the 
tripletto the singlet state, followed by emission from thesingletstate. 
This was further confirmed by measurements of the emission, lifetime 
and polarization spectra ina variable magnetic field (from OT to7T). 
The responses of single nanoparticles to temperatureand magnetic 
fields can be used to develop nanoscale sensors. The ground-state 
spin triplet of the nitrogen-vacancy colour centre is sensitive to both 
amagnetic field and thermally induced lattice strain. These properties 
underpin nanomagnetometry and temperature sensing using single 
nitrogen-vacancy-containing nanodiamonds'****, Tools such as the 
AFM tip (Fig. 1a) and optical tweezers (Fig. 1c) can be used to move 
single-nanodiamond sensors in two- and three-dimensional spaces***. 


Electric field 

The electric field has provided new insights into the excited-state 
dynamics of single semiconductor nanoparticles. The correlated and 
uncorrelated fluctuation of fluorescence intensity and lifetime for A- 
and B-type blinkingbehavioursin quantum dots havebeen determined 
by recording the fluorescence transients from single CdSe/CdS quan- 
tum dots withincreasing negative potential". The quantum-confined 
Stark effect in single semiconductor quantum dotsand nanorods can 
beused to probevoltage changes at the nanoscale. As shown in Fig. 3c, 
in contrast to the type-I system (in which conduction- and valence- 
band minima spatially overlap)”, core-shell nanorods with type-II 
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Fig.3| Application of external fields tostimulatetheresponse of single 
nanoparticles dynamically.a, Temperature-dependentSPS statistics 
revealing the thermal activation of non-radiative Auger recombination in 
charged CdSe/CdS quantum dots, induced by electron delocalization. 

The schematic shows the band alignment for thenegatively charged trionina 
CdSe/CdS nanocrystal. Auger processes are suppressed when thetrionisinits 
groundstate inatype-Istructure (left), but allowed when one of the electronsis 
delocalizedin the shell and interacts with the surface in aquasi-type-tI 
structure (right) Bottom, fluorescence intensity histograms (black crosses) 
and the corresponding charging states ofa quantum dotat 300K in air (top 
graph), at 300 K in vacuum (middle graph) and at 30K in vacuum (bottom 
graph). The noise intensity is shownby grey opencircles andis fitted with a 
Poisson distribution (blue line). X, neutral state;X*, charged state; , lifetime; 
QY, quantum yield. Image adapted from ref." (Springer Nature).b, At4K, LHPS 


band alignment (in which conduction- and valence-band minima are 
spatially separated) showed a stronger field effect because the lower 
bandgap of theshell allowed the electron tobe delocalizedin the shell. 
When the shell could be elongated on one side of the nanoparticle, the 
field effect became evenstronger™. Quasi-type-IInanorodshave been 
recently used for membrane potential sensing, and it was found that 
nanorods need to be aligned perpendicular tothe membrane surface 
for high-sensitivity sensing (Fig. 3c)”. 


Time (ns) 


show emission from a dark singlet state with astow decay ratewhenthe 
magneticfield strength increases from 0 to7 T. The schematic shows the 
Zeeman splitting and polarization of the fine-structure transitions for an LHP 
withan orthorhombic crystal structure positioned with the zaxis parallel to the 
‘magneticfield, B, andto the collection axis (Faraday configuration). Thethree- 
Jinespectrum of anLHP nanoparticle at 4K in zero field changesintoafour-line 
spectrum underamagnetic field (71). The fluorescence decay curve under 7T 
shows the singlet state component. Image adapted from ref.“ (Springer 
Nature), c, Type-lI semiconductor nanorodsexhibit higher voltage-sensing 
sensitivity than type-I quantum dots, owing to their different band alignment. 
Quasi-type-IIsinglenanorodshave been used for membrane potential sensing. 
L length of the nanoparticle. Image adapted with permission from ref.” 
(copyright 2013 American Chemical Society) and from ref.”” (American 
Association forthe Advancement of Science). 


Future opportunities and challenges 

Current PS techniques will continue to advance nanoscale character 
zationsof the photophysical properties of existingand future materials. 
Efforts are ongoing to realize new desirable functionalities in smaller 
and more efficientnanomaterials and tointegrate them through both 
heterogeneous and hybrid designs. This will encourage the develop- 
ment of more powerful SPS-based technologies. 


Nature | Vol 579 | 5March 2020 | 45 


Review 


Many biomedical and intracellular applications demand nanoscopic 
molecular probes and sensors as small as a few nanometres in size. 
However, for sizessmaller than 10 nm, itis challenging to form uniform 
nanoparticles and control their morphological and optical properties. 
The non-uniform size distribution of semiconductor quantum dots 
will lead to the dispersion of their spectral characteristics. The 
brightness of emitter-doped nanomaterials, such as nanodiamonds 
and UCNPs, can be considerably reduced when the volume of each 
nanoparticle drops. A reduced thickness of passivating shells also 
degrades the quantum yield of luminescent nanoparticles. These 
challenges require new methods of controlling the surface chemistry 
of nanoparticles. 

Although highly desirable for many applications, nonlinear 
upconversion probes do not yet emit luminescence with brightness 
comparable to that of linear ones. The focus for strong nonlinear 
upconversion luminescence needs to be on the design of organic- 
inorganic hybrid materials and proper management of efficient 
energy transfer, to achieve high efficiencies in photon sensitization, 
energy transfer and upconversion emission’. Such hybrid systems 
demand super-resolution SPS techniques capable of characterizing 
each step of sophisticated photophysics, particularly at the materi- 
als’ interface. 

Theaddition of the new functionality of luminescent nanoparticles 
is anticipated to revolutionize some key scientific and technological 
areas. For example, the engineering of chirality andits interaction with 
biomolecules opens up new perspectives in gene analysis”. Precisely 
controlling heterogeneous nanomaterials and/or arbitrarily assem- 
bling these building blocks canintegrate multiple functionalitiesinto 
asingle nanoparticle. The accuracy of controlling material growth 
and assembly reaches the nanoscale and even the atomicscale”*”. An 
interfacial strain-controlled growth method hasbeen reported to form 
heterogeneous semiconductor nanorods with helical-shell morphol- 
ogy, which induce thedevelopment of chirality in quantum dots”. The 
chirality of carbon dots can be engineered by choosingan appropriate 
chiral surface precursor”. Hybrid self-assembly will further integrate 
diverse functionalities into ananostructure””. 

Despite the rapid progress and advances already made inSPS, there 
are large remaining gaps, from the controlled formation of a diverse 
range of nanomaterials to the understanding of their fine-tunable 
photophysical properties. Filling these gaps involves challenges and 
opportunities for nanoscale and atomic-scale optical characterizations. 
Here, we discuss eight potential directions in advancing SPS. 


Super-resolution SPS 

The optical-diffraction limit will continue to constrain resolutions in 
both the lateral and axial directions in advanced SPS. The solution of 
super-resolution SPS employs current super-resolution microscopic 
techniques, which can resolvemultiple single nanoparticles and strains 
that are near each other or localize single emitters within nanoma- 
terials*””. Hell and co-workers* developed ground-state depletion 
nanoscopy to resolve 50 + 8 nm GalnP segments spaced in a GaP 
nanowire, and found excellent agreement between the ground-state 
depletion image and theSEM pattern. The recently developed upcon- 
version-stimulated emission depletion microscopy’ and its simpli- 
fied form“ may beused for decoding the barcodes’ of heterogeneous 
UCNPs**, 

Other super-resolution techniques may be suitable for SPS. Mir- 
ror-enhanced axial-narrowing super-resolution can enhance the axial 
resolution six times by simply replacing the microscope slide with a 
mirror**. A semitransparent metal film on the sample surface can be 
used to modulate the excited-state lifetime andallowsnanometre-axial- 
resolution measurement®. Ahybrid method that combines optical and 
electron microscopy provides a pathway to a resolution down to the 
single-molecule scale™. 
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The parallel developments of super-resolution techniques and new 
luminescent nanoparticles mutually enhance each other. SPSadvances 
ourunderstanding and ability totune the blinking behaviour of single 
nanoparticles, which has enabled the development of super-resolution 
optical fluctuation imaging and stochastic optical reconstruction 
microscopy”*””, The interference of scattering patterns of single 
nanoparticles has been recently employed for high-speed tracking 
of single molecules in cells”. Characterizations of the excitation and 
emission dipoles of single nanoparticles can be used to resolve the 
orientations of the nanodevice transport process by super-resolution 
polarization microscopy”. 


Multi-modalcorrelativeSPS 

Correlative microscopy, which integrates SPS techniques, electron 
microscopy characterization and manipulation methods, combines 
multi-modal measurements to determine precisely how each of the 
structural properties of amaterial determinesits overall optical behav- 
iour. Individually, each of these methods providesincomplete, or often 
even misleading, information. 

Recent studies suggest that the luminescence properties associated 
with carbon dots may also originate from molecular fluorophores 
or their aggregates that are unavoidably formed during carbon dot 
synthesis”. The co-existence of luminescent molecules in solution 
is undetectable under TEM. Fluorescence correlation spectroscopy, 
which provides information on the diffusion of emitters in solution, 
has allowed researchers to distinguish between nanoparticles and 
molecular fluorophores”. 

Careneeds tobe takenwhen the measurement substrate also emits 
luminescence that leads to deceptive results. Thisis often the case for 
SPS, because background noise often competes with the signal from 
individual nanoparticles. Defectsin glass, depending on their typeand 
local chemical structure, can emit luminescence covering the whole 
visible spectrum”. What makes SPS measurements more challenging 
is that optical emissive defects can be created asa result of intense 
ultraviolet irradiation’*, The similarity of the optical properties ofa 
single luminescent defect in SiO, to those of a dye molecule requires 
multi-modal correlative SPS to reliably distinguish between them’>”. 
The combination of measurements that probe the optical and structural 
properties of emitters can unambiguously attribute fluorescence to 
acertainorigin. 


Nanoscale tweezing 

Contactless trapping and tweezing of individual nanometre-sized 
nanoparticles, in combination with SPS, will provide many opportu- 
nities for assembling hybrid nanoparticle-based devices and in situ 
studies of distance- and orientation-dependent phenomena, such as 
energy transfer and force dynamics between different types of single 
nanoparticle building blocks. Optical trapping has been applied to 
confine ultracold atoms” and nanoparticles using.a tightly focused 
laser beam”. However, conventional (optical) tweezers face consid- 
erable challenges in tweezing nanoparticles”, because of the limited 
trapping force against Brownian motion and thermophoretic forces. 
Newmethods involving nanotweezers and nanophotonic devices-for 
example, microstructure photonic fibre-have been recently devel- 
oped", The hybrid electrothermoplasmonic nanotweezer enables 
on-demand, long-range and rapid delivery of single nano-objects to 
specific plasmonic nanoantennas to achieve two-dimensional assem- 
bly". Near-field nanotweezers with specifically designed nanoarchi- 
tecture—for example, engineering a bowtie plasmonic aperture at 
the extremity of a tapered metal-coated optical fibre-could be used 
to realize three-dimensional optical manipulation of single dielectric 
objects". Other approaches, suchas open-access microcavities, may 
provide both in situ calibration and sensing capabilities during trap- 
ping of single nanoparticles". Cohenand Moerner™*” developedan 


anti-Brownian electrophoretic trap thatapplies two-dimensional force 
fields to trap single nanoscale objects in solution. 


Surface of single nanoparticles 
Non-uniformity of surfacespecies and charge can induce heterogeneity 
inthe optical properties of nanoparticles and affectnearly all their pro- 
jected applications, from specific molecular targeting to nanodevice 
self-assembly. SPS has often been insufficient for obtaining chemical 
information on the surface of a single nanoparticle. Methods such as, 
far-field Raman spectroscopy provide perspective in characterizing 
the surface species of single nanoparticles. Very recently, Xiong and 
co-workers" developed stimulated Raman excited fluorescence spec- 
troscopy to perform surface molecule analysis with high sensitivity and 
chemical specificity. They demonstrated all-far-fieldsingle-molecule 
Ramanspectroscopy and imaging withouttheneed of plasmonics for 
near-field enhancement. 

The anisotropic surface of nanoparticles determines the hetero- 
geneous binding probabilities of ligands on different crystal facets 
ofa nanoparticle. Surface anisotropy can be resolved by using super- 
resolution stochastic optical reconstruction microscopy to resolve 
tip-end-bound dyes on single upconversion nanorods'™. 


ight absorption by single nanoparticles 

Fluorescence-based characterizations of single nanoparticles with low 
quantum yield, or quenched nanoparticles that are close to metallic 
surfaces or in chemical contact with a quencher, are not accessible 
to single-nanoparticle sensitivity. These ‘dark’ nanoparticles can be 
potentially detected by measuring their absorption of light. However, 
the several-orders-of- magnitude difference between the absorption 
cross-section of nanometre-sized nanoparticles and the diffraction- 
limited focal spot makes it nearly impossible to detect the fraction 
of excitation light that is absorbed by a particle. Sandoghdar and 
co-workers" detected the fraction of light that is absorbed by a sin- 
gle semiconductor quantum dot from a noisy background. This was 
achieved by normalization of the signal, which allows the reduction 
of laser intensity fluctuations by an order of magnitude. Polarization 
modulation of excitation lightand the use of samples of lower surface 
roughness have further improved the sensitivity of absorption imag- 
ing and led to quantitative values for the absorption cross-section of 
single molecules under ambient conditions". 

Several other advanced methods have been explored for single- 
particle absorption imaging and even spectroscopy measurements 
(Fig. 4a). Confining an atom in a radio-frequency Paul trap has 
achieved absorption imaging of a single laser-cooled atom'™. Using a 
microtoroidal whispering-gallery-mode resonator", the resonance 
shifts of the resonator could be translated intoa photothermal sensitiv- 
ity of tens of picowatts, which is sufficient to measure the absorption 
spectrum ofa single molecule. Photothermal heating ofa temperature 
sensitive substrate" has enabled single-particle absorption imag- 
ing. By using evanescent waves with a lightpath comparable with the 
size of nanoparticles in total-internal-reflection microscopy", the 
extinction spectroscopy of single particles can be measured. Bawendi 
and co-workers" have recently suggested using bow-tie antennas to 
confine the excitation field within the tiny dimensions of a plasmonic 
nanostructure. This confinement design can potentially enable further 
enhancement of the accuracy of single-particle absorption measure- 
ments. 


Quantum yield of single nanoparticles 

The challenge for SPSin measuring absolute quantum yields liesin dis- 
cerningthe number of photons absorbed by asingleparticle. There are 
alternative ways to measure quantum yield, which are based on the ratio 
of the radiative (k,) to the non-radiative (k,,) transition rate between the 
excited state(s) to the ground state of asinglenanoparticle. Fine-tuning 
of either the radiative or the non-radiative rate allows one to measure 


the total de-excitation rate (k, + k,,), that is, the inverse excited-state 
lifetime. Whereas the non-radiative rate of a fluorophore is typically 
determined by itsintrinsic properties and local chemical environment, 
the probability of radiative de-excitation can be tuned by changing the 
so-called local density of states of the electromagnetic field’. 

The radiative-rate change effecthas been achieved for fluorophores 
placed close to a dielectric interface"**”, a sharp tip of a scanning 
probemicroscope”, ametallic mirror”, ametallicnanoparticle™, or 
between two gold nanoparticles” or silver mirrors of an optical resona- 
tor” (Fig. 4b). Although such methods have been applied for organic 
molecules, the enhanced photostability of nanoparticles enables more 
precise measurements. A simplified approach has been reported by 
Brokmann and co-workers", in which they measured the modulation of 
the excited-state lifetime ofa single quantum dot by placing a droplet 
ofa polymer on top of a substrate. This allowed them to estimate the 
quantum yield of a single nanocrystal in the bright (on) state to be 
close to unity. 


High-throughput SPSand data analysis 

Whereas SPS measurements are usually limited to nanoparticles with 
sufficientbrightnessand the SPS method primarily relies on repeated 
single-particle experiments to achieve statistical results, ensemble 
methods canbe usedasa prescreening tool before selecting asmaller 
number of samples for single-particle analysis. High-throughput SPS 
and automation of data analysis are desirable for the implementa- 
tion of single-particle studies into routine sample analysis. A widefield 
imaging scheme using a commercially available hyperspectral imaging 
system or a prism to disperse spectral information can drastically 
enhance the detection throughput and speed"**. The latter config- 
urtion has been reported to capture the emission spectra of about 
100 randomly distributed single molecules within snapshots of a few 
milliseconds. 

Machine (deep) learning can go beyond the limits of conventional 
data analysis”. Deep learning has been recently used in analysing 
single-molecule patterns” and to reconstructa widefield image into 
asuper-resolution one without substantive implementation once the 
computational model has been built". This indicates that repeated 
experiments in SPS measurements may become unnecessary by using 
deep learning to identify and record the optical signatures of single 
nanoparticles. 


SPSstandardization 

Many ensemble measurements of nanomaterials must be optimized 
to become quantitative, given that the results obtained by different 
research groups can beaffected by different instrumentation settings 
and measurement environments. For UCNPs in the ensemble form, 
the enhancement factors in emission intensity and quantum yield of 
inertshell-coated UCNPsamplescan vary froma few toafeworders of 
magnitude, even when using similar design strategies". The quantum 
yields of UCNPs are strongly dependent on the excitation power and 
the particle density of UCNPs in powder or suspension. 

‘To make these comparisons realistic, SPS can provide an absolute 
number of emitted photons at different excitation power densities. 
Until commercial suppliers can offer a less-complicated optics setup 
for SPS-for example, a peripheral control package and accessories—a 
metrology platform should be established to serve the materials sci- 
ence community. Sucha standardized platform would be particularly 
important for nonlinear optical conversion, in which the illumina- 
tion power density substantially contributes to diverging electronic 
behaviours’, SPS standardization and accessibility will enable 
the rapid search for efficientand uniform nanoparticles from various 
synthesis methods, recipes or experimenters. These nanoparticles 
can beselected for a variety of potential applications according to 
their performancein terms of power-dependent intensities, intensity 
distribution, and saturation in brightness. 
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Fig. 4| Perspective foradvancedSPS.a-h, Photonics architecturesadopted 
for absorption (a-d) and quantum yield (e-h) measurements for single objects, 
a, Alaser-cooled "Yb" ionisconfinedina radio frequency (RF) Paul trap 
formed by the electric quadrupole (dashed lines) between twotungsten 
needles, Image adapted from ref." (Springer Nature). b, Agoldnanoparticle 
‘ona toroidal microresonatoris pumped to generate a photothermal 
absorption signal, which shifts the toroid resonance frequency. The resonance 
Frequency is probed with a fibre-coupled tunable-frequency laser. Image 
adapted from ref." (Springer Nature). ¢, 10-nm gold nanoparticlesare 
‘measured onasilicon nitride drum under atensile stress of 20 MPa for 
absorption imaging, Image adapted fromref.™ (National Academy of 
Sciences).d, Single-particle extinction spectroscopy by totalinternal 
reflection. /,(A), intensity of input light; /,.(A), intensity of output light;A, 
wavelength. Image adapted with permission from ref." (copyright 2019 
Wiley-VCH Verlag GmbH & Co. KGaA). e, The decay rate of quantum dotsisfirst 
‘measured when they are close toaglass-air interface and then measured when 
apolydimethylsiloxane (PDMS) drop isaddedto displace theinterface far from 


Insummary, developmentsin materials science and nanophotonics 
tools now offer the potential to achieve sensitivity and resolution better 
than the single-nanoparticle level for the characterization of optical 
nanomaterials. This will enable the controlled use of nanoparticles 
with optimal properties for a broad range of applications. 
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Awide range of metals exhibit anomalous electrical and thermodynamic properties 
when tuned toa quantum critical point (QCP), although the origins ofsuch strange 
metals have posed along-standing mystery. The frequent association of strange 
metals with unconventional superconductivity and antiferromagnetic QCPs'*hasled 
to the belief that they are highly entangled quantum states’. By contrast, 
ferromagnets are regarded as an unlikely setting for strange metals, because they are 
weakly entangled and their QCPs are often interrupted by competing phases or first- 
order phase transitions*® *. Here we provide evidence that the pure ferromagnetic 
Kondo lattice™” CeRh,Ge, becomes a strange metal ata pressure-induced QCP. 
Measurements of the specific heat and resistivity under pressure demonstrate that 
the ferromagnetic transition is continuously suppressed to zero temperature, 
revealing strange-metal behaviour around the QCP. We argue that strong magnetic 
anisotropy hasa key role in this process, injecting entanglementin the form of triplet 
resonating valence bonds into the ordered ferromagnet. We show thata singular 
transformation in the patterns of the entanglement between local moments and 
conduction electrons, from triplet resonating valence bonds to Kondo-entangled 
singlet pairs atthe QCP, causes ajump in the Fermi surface volume—a key driver of 
strange-metallic behaviour. Our results open up a direction for research into 
ferromagnetic quantum criticality and establish an alternative setting for the strange- 
metal phenomenon, Most importantly, strange-metal behaviour at a ferromagnetic 
QCP suggests that quantum entanglement—not the destruction of 


antiferromagnetism—is the common driver of the varied behaviours of strange 


metals. 


Quantum materials that are augmented by strong electronic correla- 
tions are promising for various applications, but the electronic inter- 
actions that empower these materials challenge our understanding. 
One of the most pressing questions in strongly correlated electronic 
systems isthe origin ofthe strange-metal behaviour that developsata 
quantum critical phase transition between a delocalized Fermiliquid, 
and alocalized or partially localized electronic phase. prime example 
is the strange-metal behaviour that develops in the normal state of 
copper oxide superconductors at optimal doping, characterized by 
arobustlinear resistivity and a logarithmic temperature dependence 
of the specific heat coefficient”; similar behaviour is also observed 
invarious quantum critical heavy electron materials. The underlying 
universality of strange-metal behaviour that develops in the vicinity 
of QCPsis currently a subject of intense theoretical interest. One of 
the valuable ways of identifying the key ingredients of strange-metal 
behaviour isthrough experiments that explore new classes of quantum 
materials. 


Kondo lattice systems, which have periodically arranged atoms host- 
ing localized f electrons show a rich variety of properties, owing to 
competition between magnetic interactions among local moments 
and their magnetic screening by conduction electrons, the so-called 
Kondo effect". The small energy scales of these interactions leads to 
highly tunable ground states, which isideal for studying strange-metal 
behaviour. Ina variety of systems, tuning this competition leads toa 
continuous suppression of antiferromagnetic order ata QCP‘. However, 
the outcome whena ferromagnetic (FM) transition is suppressed by 
a non-thermal tuning parameter is generally different’. FM QCPs are 
usually avoided, owing to the occurrence of a first-order transition", 
the intersection of antiferromagnetic phases", or a Kondo cluster 
glass phase". This raises the question of whether antiferromagnetic 
correlations are crucial for realizing strange-metal behaviour. 

Early theoretical studies of itinerant ferromagnets**in the framework 
of Hertz-Millis-Moriya theory" predicted that quantum phase transi- 
tions in these materialsinevitably become first-order asa consequence 
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Fig.1|Crystal structureand physical properties ofCeRh,Ge,atambient 
pressure.a, Crystal structure of CeRh,Ge,. The red, blueand yellow atoms 
denote Ce, Rh, and Ge, respectively. Left, the structure perpendicular tothe 
a-bplane, where the Ce atomshavea hexagonal arrangement. Right, the 
structure perpendicular tothe chain direction (caxis).b,¢, The resistivity p(T) 
(with the current parallel to the caxis;b)and specific heat as C/T(e) versus T 
for CeRh,Ge,,inzerofield and various fields appliedin the a-b plane. 

d, Temperature dependence of the magnetization of CeRh,Ge,, as M/H, ina 
field of | mT applied both along thec axisand in the a~b plane, wherethe data 
for thecaxis field arescaled by afactor of 100, Low-field magnetizationloops 
for fields withinthea-b plane at three temperatures. Below T., these exhibit 
hysteresis loops typical of FM order, whereasat 3 Kno hysteresisis observed. 

fu, formula unit. 


ofinteractions between the critically scattered electron fields, thereby 
interrupting the development of quantum criticality. However, the 
recent discovery of an FM QCP in the heavy-fermion system YbNi,P, 
when tuned by chemical pressure" raised the possibility thatthe FMQCP 
inthese systems is governed bya different universality classinvolvinga 
breakdown of Kondo screening”. The negative pressure required to 
reach the FMQCP of YbNi,P, necessarily involves chemical doping of the 
stoichiometriccompound, whichintroduces disorder, complicatingthe 
theoretical interpretation. Disorder suppresses first-order transitions®, 
asin the case of ZrZn,, in which early experiments suggested the pres- 
ence of an FM QCP”, but improved sample quality led toa first-order 
transition”. Therefore, although the experimental data on YbNi,P,sug- 
gest the existence of FM QCPs, definitive proof of such behaviour ina 
quantum ferromagnet requires using hydrostatic instead of chemical 
pressure. Cerium-based heavy-fermion ferromagnets, in which pressure 
cancleanly tune the system toa QCP, are ideally suited for such studies. 

CeRh,Ge,isaheavy-fermion ferromagnet witha Curie temperature” 
T.=2.5K. Thecrystal structure (Fig. 1a) consists of triangular lattices of 
cerium stacked alongthe caxis’. The Ce-Ce separation ismuch smaller 
along thecaxis (3.86 A) than inthe triangular planes (7.15 A), suggesting 
aquasi-one-dimensional nature to the magnetism. Under hydrostatic 
pressure, we find that the FM transition of CeRh,Ge, is smoothly sup- 
pressed tozero temperature, reachinga QCP at p.= 0.8 GPa. 

‘The temperature dependence of theresistivity p(7)and thespecific 
heat (as C(7)/7) of single-crystalline CeRh,Ge, both show transition 
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anomalies at around T= 2.5 (Fig. Ib,c). When magnetic fields are 
applied within the a-b plane, the transition becomes a broadened 
crossover, consistent with FM ordering. The low-temperature mag: 
netization divided by the applied field, M/H, isshown in Fig. 1d. Meas- 
urements up to 300 K demonstrate that the magnetic easy direction 
lies within the a-b plane (Extended Data Fig. 1).On cooling, justabove 
T-the in-plane M/H undergoes a marked enhancement, typical of FM 
order. For fields along the c axis, M/H abruptly increases at the transi- 
tion. Magnetization loops below 7, for in-plane fields show hysteresis 
that is characteristic of FM materials (Fig. 1e). M(H) increases rapidly 
at low fields, reaching 0.28, per Ce atom for igH= 0.017 T at 0.44 K 
(i, Bohr magneton; #4., magnetic constant). Upon further increasing 
the field, there sno hysteresis between up and down field sweeps, and 
M(H) changes slowly, indicating that 0.28, per Ce atom corresponds 
to the ordered moment (Extended Data Fig. 1). 

The zero-field resistivity and specific-heat coefficient at various 
pressures are displayed in Fig. 2a, b (see also Extended Data Figs. 3, 
4). The evolution of the properties with pressure (p) and the result- 
ing T-p phase diagram are presented in Fig. 3a, b. At; the resistivity 
changes from alinear T-dependent behaviour at high temperature to 
a7*-dependent behaviour at low temperatures (Extended Data Fig. 3), 
where C(7)/T becomes temperature independent. The FM transition, 
which is suppressed almost linearly by pressure, cannot be detected 
beyondp,=0.8 GPa. Inthe paramagnetic phaseabovep,, the aforemen- 
tioned low-T properties of aFermi liquid are again observed (Extended 
DataFigs. 3, 4). The temperatureatwhich this Fermi-liquid behaviour 
onsets (7;,) increases almost linearly with pressure (Fig. 3b). Both the 
value of the low-temperature C(7)/Tand the coefficient of the resistiv- 
ity inp(1)=po+AT* (po, residual resistivity) show an incipient divergence 
when approaching p, from the FM or paramagnetic side (Fig. 3a).On 
both Fermi-liquid sides of the phase diagram, the Kadowaki-Woods 
ratioA/y? (y, Sommerfeld coefficient) is1.49«10-*atambient pressure 
and1.33 10" pO cm mol? K? mJ? at 1.12 GPa, which are close to the 
value for a4f-electron ground-state degeneracy N=4. 

Atp.=0.8GPa, the resistivity is strictly linearin temperature over two 
orders of magnitude down to atleast 40 mK, whereas C(7)/T=log(*/T) 
over nearly an order of magnitude with 7*=2.3K (T*isa characteristic 
temperature of the spin fluctuation energies)’; see Fig. 2c. At 60 mK, 
C(D/Treachesavery large value of 1.1) mol" K*, Between the FM and 
paramagnetic phases, there isa fan-shaped strange-metal region with 
properties similar to canonical antiferromagnetic quantum critical 
systems suchas CeCu,.,Au,”and YbRh,Si;”. The pressure dependen- 
cies of A and y (Fig. 3a) follow the residual resistivity py, which also 
developsamaximum at p,, reflecting the presence of quantum critical 
fluctuations (Extended Data Fig. 3). 

Acfirst glance, the strange-metal properties of CeRh,Ge, might be 
attributed to itinerant quantum criticality, because, aside from the 
absence of a first-order phase transition, Hertz-Millis-Moriya theory 
predicts logarithmic Sommerfeld coefficientand a T-linear electron 
scattering rate, naively equivalenttoa T-linear resistivity’. However, the 
scattering offlong-wavelength FM fluctuations doesnot relaxelectron 
currents, and once this effect is included, p(7) is expected to follow 
aT™ dependenceatlow temperature*".A T-linear resistivity suggests 
large-angle scattering, a feature typical of local fluctuations involving 
awide range of momenta. Moreover, the strength of the logarithmic 
divergencein the specific-heatcoefficient, from fitting C/T with (Sy/T*) 
log(7*/T), shows thata large fraction of the local moment entropy, 
$= (I/10)Rlog2 (where Ris the gas constant), is released over a tem- 
perature scale 7*(ref. 4). By contrast, the itinerant Hertz-Millis-Moriya 
theory predicts S, = (qo/q;)° where gis the momentum cutoff of the 
itinerant magnetic fluctuations and g; is the Fermi momentum (Sup- 
plementary information). Applying this theory to thedata then requires 
90=4;, which, by Fourier’s theorem, impliesthatthe critical spin fluctua- 
tions are local. Together with the absence of afirst-order phase transi- 
tion, these features provide strong evidence in favour of a local QCP. 


p waem) 


CIT (Umol K2) 


CTW mor K2) 
puaem 


C/T = log) 


02 5 
at 1 


TK 

Fig.2| Pressure evolution of ferromagnetismin CeRh,Ge, and strange-metal 
behaviour at the QCP. a, Resistivity of CeRh,Ge, under varioushydrostatic 
pressures. The FM transitionis suppressedby pressure, andis nolonger 
observed at p.=0.8GPa (red line). Inset, derivative of p(T) atlower pressures; 
the peak position corresponds to 7...b, Specific heat of CeRh,Ge, under 
hydrostatic pressures. The bulk FM transitionissuppressed with pressure, as 
indicated by the vertical arrows showing the position of 7... For clarity, notall 
the data points re displayed. The error bars shown are representative of the 
scattering of the dataat low temperature. A transition to Fermi-liquid 
behaviour atlow temperatures canbe observed on either side ofp, where 
C(Ty/TFlattens.¢, p(T) and C(T)/Tat p.=0.8 GPa. p(T) exhibits linear behaviour 
extending from 5K downto at least 40 mK (dotted line), whereas C(7)/T 
continuesto increase with decreasing temperature, exhibitinga dependence 
proportional to log(7"/7). 


In antiferromagnetic heavy-electron metals, the development of 
Tlinear resistivity at the lowest temperatures coincides with anabrupt 
jump in the Fermi surface volume, accompanied by singular charge 
fluctuations**, thas been argued thatsuch ajump inthe Fermisurface 


is caused by anabrupt transformation in the pattern of spin entangle- 
ment’,as the Kondo singlets transform into resonating valencebonds 
(RVBs) in the spin fluid, This posesa problem, because the spins ina 
simple ferromagnetare not entangled, which wouldimply acontinuous 
evolution of the Fermi surface”. As shown below, aclueto unravelling 
this puzzle comes from the unusual aspect that CeRh,Ge, developsa 
strange-metal phase at an FM QCP, similar to that observed for the 
non-stoichiometric material YbNi,P,.,AS,"°. 

Apart from their quasi-one-dimensional nature, acommon feature 
of these two materialsisan easy-plane anisotropy. In suchsystems, the 
magnetic-order parameter is no longer conserved and will develop 
marked zero-point fluctuations, which are probably responsible for 
the severely reduced magnetic moment. This can be seen clearly in 
a two-site example where the magnetization is perpendicular to the 
quantization (z) axis of the spins. The ordered phase isa product state 
that can be expanded in terms of triplets, 
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where and are site indices. An easy-planeanisotropy projects out the 
equal-spin pairs on the right-hand side, creatinga tripletvalence bond. 
Inalattice, thesame effect createsa quantum superposition of triplet 
pairs, formingatriplet RVB state, |{RVB). Hence, easy-plane anisotropy 
in FM systems has the same role as magnetic frustration in antiferro- 
magnetic systems, injecting a macroscopic entanglement into the 
groundstate. This leads us to hypothesize thatthe strange-metal behav- 
iour at the FM QCP hasits origins in the magnetic anisotropy. 

Totest these ideas, we have studied a simplified Kondo lattice model 
with nearest-neighbour FM couplings with easy-plane anisotropy of 
the form ~/', (S#S¥ + 5'S}) ~ 2 575} ona tetragonal lattice, consisting 
of spin chains along the c direction with weak inter-chain couplings 
(see Supplementary Information). Here, 5""7 are thex, yand zcom- 
ponents of thespinatsite/, and and ! are the magnetic couplings 
between the spins atsitesiandj. When the chains re weakly coupled, 
our simulations indicate the development of a second-order phase 
transition, whereas at higher couplings a first-order phase transition 
develops. This feature is in agreement with the current observations 
of FM QCPs developingin quasi-one-dimensional systems. We assume 
Joy>J., which hasa dual effect: it converts the model intoaneasy-plane 
Xx-y ferromagnet, and generates triplet RVBs. Also, the anisotropy 
changes the magnetic dispersion at low momenta from quadratic to 
linear (see Supplementary Information). By switching on the Kondo 
screening” we can then tune the model to the QCP. 

Our calculations take advantage of a Schwinger-boson represen- 
tation of the magnetic moments, which enables us to examine both 
the magnetic and Kondo-screened parts of the phase diagram, and 
the QCP that links them together (Fig. 3c). The key feature of this 
approachisarepresentation of the spins as bosonic spinons, enabling 
a dynamical description of the Kondo effect in which neutral local 
moments fractionalize into negatively charged electrons, leaving 
behind positively charged Kondo singlets. In the ordered phase, a 
majority of the moments are aligned, although some form triplet 
RVB pairs with their neighbours. In an isotropic ferromagnet, the 
continuous growth of magnetization away from the QCP indicates 
acontinuous change in the fraction of Kondo-screened moments, 
ora continuous evolution of the Fermi surface. However, when the 
moments entangled within tRVB states are abruptly released into 
the Fermi sea, we find (see Supplementary Information) that there 
isajump in the Fermi surface volume. The resulting QCP isa plasma 
in which the Kondo singlets, the electrons and the RVB bonds are in 
astate of critical dynamical equilibrium, giving rise to singular spin 
and charge fluctuations as well as a specific-heat coefficient that is 
logarithmic in temperature (Supplementary Information), in agree- 
ment with our experimental results. 
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Fig.3|Phase diagram of CeRh,Ge, under pressure. a, Pressure dependence of 
the A coefficient of the Tterm fromthe resistivity and Sommerfeld coefficient 
(as C/Tat 60 mk), which showsa pronounced maximum nearthe QCP. The 
errorbars forthe A coefficient are smaller than the symbols. For y, the errors 
correspond to the scatteringof the low-T data. The dashed line isa guide tothe 
eye. b, T-p phase diagram of CeRh,Ge,. The circles, trianglesand squares 
for pressures belowp. denote [derived fromthe resistivity, specificheat(d. 
method), anda.c. heat capacity (Extended Data Fig. 5), respectively. The 
corresponding symbols above p-mark Tx, below which Fermi-liquid behaviour 
‘occurs. The FM transitionis suppressed by pressure until the system reaches a 
QCPatp,~0.8GPa. Below T., andaat higher pressures below T,,, Fermi-liquid 
ground tates develop. The colours denote the exponent of p(T) calculated as 


Our findings ofa pressure-induced QCP in CeRh,Ge, demonstrate 
that an FM system can develop a continuous quantum phase transi- 
tion in the absence of disorder, a result that at present can only be 
understood in the framework of local quantum criticality, where 
Kondo screening is suppressed to zero at the QCP. The observation 
of strange-metal behaviour at finite temperatures above the QCP— 
thatis, linear resistivity and a specific-heat coefficient thats loga- 
rithmically divergent in T-expands the scope of thisphenomenon to 
encompass ferromagnets. Central to the strange-metal behaviour in 
a ferromagnet isa small abruptjump in the Fermi surface volume. An 
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= d(log(p—p,))/d(log7), where the Fermi-liquid states with n=2are dark 
blue,and the strange-metal phase near the QCP with n=1isshown in pink. 

¢, Schematicrepresentation of different phases. Inthe ordered phase (left), 
mostofthe spins are orderedin the plane, although some have RVB bonds, The 
Fermi surfaceissmall, as represented by the volume of the conduction sea. In 
the paramagnetic Fermi-liquid phase (right), all the spinsare ‘ionized’ toform 
heavy electrons that expand the Fermi sea. A background of positively charged 
singletsare leftbehind, At the QCP (centre), the system isin a dynamical critical 
equilibrium in which the momentsare fluctuatingand the Kondo screeningby 
the conduction electroncompeteswithRVBs for the entanglement. Inthis 
region, critical fluctuations strongly scatter the conduction electrons. 


experimental observation of sucha jump would be an unambiguous 
test of Kondo breakdown, because there is no unit-cell doubling atan 
FM phase transition. 

Finally, spin-triplet superconducting pairing states have been 
proposed in FM heavy-fermion systems such as UGe,” and URhGe”. 
Although there is no sign of superconductivity in CeRh,Ge, down 
to 40 mK, itis probable that at sufficiently low temperatures the 
triplet RVB states that are already present in the critical regime 
will migrate into the conduction band asa triplet superconducting 
condensate. 
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Methods 


Crystal growthand characterization 

Needle- shaped single crystals of CeRh,Ge, were grown usinga bismuth 
flux’. The elements were combined in a molar ratio of Ce:Rh:Ge:Bi of 
1:6:4:150, and sealed in an evacuated quartz tube. The tube was heated 
and held at 1,100 °C for 10h, then cooled at 3 °C per hour to $00 °C. The 
tube was then removed, and centrifuged toremove the excess bismuth. 
The orientation of the crystals was determined using single-crystal 
X-ray diffraction, and the chemical composition was confirmed using 
energy-dispersive X-ray spectroscopy. The samples measured under 
pressure had typical values of p,=1.6 1 cm anda residual-resistance 
ratio of RRR =p(300K)/p(0.3 K) ~ 45 (Extended Data Fig. 2). 


Physical property measurements 

Magnetization measurements were performed using a Magnetic Prop- 
erty Measurement System (Quantum Design). The heat capacity at 
ambient pressure was measured down to 0.4K in applied magnetic 
fields up to 14 T, using a Quantum Design Physical Property Measure- 
ment System (PPMS) with a’He insert, using the standard relaxation 
method. Specific-heat experiments under pressure were carried out 
using a copper beryllium piston-cylinder-type pressure cell”. The 
sample and a piece of lead as pressure gauge were put ina teflon cap- 
sule together with Flourinert serving asa liquid pressure-transmitting 
medium. The capsule was then mounted inside the pressure cell. The 
heat capacity of the whole assembly was determined by a compensated 
heat-pulse method ina dilution refrigerator (Oxford Instruments) down 
to temperatures of 60 mK. To obtain the heat capacity of the sample 
the addenda has been recorded ina separate measurement run and 
subtracted for each pressure from the data obtained of the whole setup 
including the sample. The pressureinside the cell was determined by the 
pressure-induced shift of thesuperconducting transition temperature 
of the piece of lead, whichwas measuredin a Magnetic Property Meas- 
urement System (Quantum Design). The magnetic field was removed 
in an oscillating fashion to reduce the remanent field (<3 Oe) of the 
superconducting magnet. The remaining effecton the superconduct- 
ing transition temperature was compensated for by determining the 


shift of the superconducting transition of the lead inside the pressure 
cell with respect to a reference piece fixed to the outside. Electrical 
transport and a.c. calorimetry measurements under pressure were 
carried out ina piston-cylinder clamp-type cell with Daphne oil 7373 
asapressure-transmitting medium. The pressure wasalso determined 
from the superconducting transition of lead. The resistivity wasmeas- 
ured using the four-contact configuration between 0.04 K and 300K. 
The measurements were performed down to 1.9K, 0.4K and 0.04 K in 
a PPMS, He refrigerator and dilution refrigerator, respectively. Data 
obtained from these measurements areall consistent. 
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Extended Data Fig. 1|Magnetic susceptibility and field-dependent 
magnetization. a, Temperature dependence of the magnetic susceptibility 
(4(7) of CeRh,Ge, ina field of 0.1 T applied both alongthecaxisand inthea-b 
plane, where bothaxesare plotted ona logarithmic scale. y(7)isanisotropic 
across the whole temperature range: the a-b plane corresponds tothe easy 


direction. b, Magnetization loopsmeasured at3K and 0.44 K, above and below 
Te, respectively. In the FM state, the magnetization increases rapidly atlow 
fields, reachinga value of around 0.28,1,perCe atom, which probably 
corresponds othe ordered moment, whereasat higher fields the 
‘magnetization increases more slowly. 
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Extended Data Fig.2|Temperature-dependent resistivity atambient obtained from subtracting the data ofLaRh,Ge,. Thisexhibitsa broad 
pressure. Temperature dependence of the resistivity (p(7)) of CeRh,Ge,and —_-maximumat around 80K, probably asa consequence of both the crystalline 
for the non-magnetic analogue LaRh,Ge,, with the currentalongthecaxis, The _electricfieldand Kondoeffects. 

inset shows the magnetic contribution to the resistivity of CeRh.Ges (px). 
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Extended Data Fig.3| Analysis of the resistivity under pressure.a, Low: 
temperature p(7) of CeRh,Ge, versus T* under pressures up to 0.69 GPa. For 
clarity, the data at consecutive pressuresare offset vertically by 0.21cm. The 
low-temperature data in the magnetic state was fitted witha quadratic 
temperature dependence, p(7)=p,+AT", asshown by the solid lacklines. 

b, The corresponding derivative dp(T)/d7, where the position of Twas 
determined ateach pressure fromthe position of the maximum, asindicated 
bythe vertical arrows.a.u.,arbitrary units.c, Low-temperature p(T) versus T?of 
CeRh,Ge,at pressuresabove the QCP; the data at consecutive pressuresare 
offset vertically by 0.02 10.cm. The solidlines show the quadratictemperature 
dependence, indicating the occurrence of Fermi-liquid behaviour at low 
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temperatures. d, Low-temperature enlargement of p(7) ~pofor two pressures 
eitherside of the QCP, where the dataat 0.69 GPaare vertically offset by 

0.02 1.0cm.e, Resistivity asa function of temperature plotted as 6p=p-pj.. 
for various pressures p. px isthe Fermiliquid contribution to the resistivity, 
obtained from fitting thelow-temperature p(7) with aquadratictemperature 
dependence. The deviation of 6p from zero indicates the onset of non-Fermi- 
liquid behaviour, and hence corresponds to Ti., as marked by the vertical 
arrows. f, Pressure dependence of the residual resistivity p,, obtained from 
analysing the low-temperature p(7) at various pressures, and wheretheerror 
barsare smaller than the symbol size. This quantity reaches amaximum around 
theacp. 
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Extended Data Fig. 4 | Analysis of the heat capacity under pressure. 
a, Temperature dependence of the absolute value of the heat capacity as C/T, at 
various pressures below p...For pressures up to 0.72 GPa, T..canbedetected,as 
marked by the verticalarrows. Atlower pressures thisis determined fromthe 
peak positions, whereas close top. itis determined by the intersection ofthe 
solid ines indicatedin the figure. b, The data for two pressures near p, after 
subtracting the data takenat 0.8 GPatoremovethe logarithmic contribution 
toC/T.Inboth cases, the peak position of AC/Tisin good agreement withthe 
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value of T-obtained froma, ¢, Low-temperature C(7)/Tfor three pressures 
above the QCP. The strong increase with decreasing temperature corresponds 
tonon-Fermi-liquid behaviour, whereas the flatteningof C(T)/Tatlow 
temperatures correspondsto the onset of Fermi-liquid behaviour. The position 
of the temperature below which Fermi-liquid behaviour occurs, Ti.,is 
highlighted by the vertical arrows, and isdetermined from the deviation from 
the near-temperature-independent behaviour marked by the dashed lines, 
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isdetected down to the lowest measured temperature, 0.3K; instead, C/T 
continues toincrease with decreasing temperature. At1.69 GPa, well above the 
_ arbitrary units. 


Extended Data Fig.5| Thea.c. heat capacity under pressure. Thea.c. heat 
capacity as C/Tat various pressuresup to 1.69 GPa. For pressures below 
0.83GPa, the position of Tcismarkedby the verticalarrows. The dashedlines __ QCP, C/Tshowslittle temperature dependence. a 


showthe constructionused to determine Z-near p.. At0.83GPa,no transition 
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are Studies of two-dimensional electron systems ina strong magnetic field revealed the 


quantum Hall effect’, a topological state of matter featuring a finite Chern number 
Cand chiral edge states**. Haldane’ later theorized that Chern insulators with integer 
quantum Hall effects could appear in lattice models with complex hopping 
parameters even at zero magnetic field. The ABC-trilayer graphene/hexagonal boron 
nitride (ABC-TLG/hBN) moiré superlattice provides an attractive platform with which 
to explore Chern insulators because it features nearly flat moiré minibands witha 
valley-dependent, electrically tunable Chern number**, Here we report the 
experimental observation of a correlated Chern insulator in an ABC-TLG/hBN moiré 
superlattice. We show that reversing the direction of the applied vertical electric field 
switches the moiré minibands of ABC-TLG/hBN between zero and finite Chern 
numbers, as revealed by large changes in magneto-transport behaviour. For 
topological hole minibands tuned to have a finite Chern number, we focus on quarter 
filling, corresponding to one hole per moiré unit cell. The Hall resistance is well 
quantized at h/2e* (where his Planck's constant and eis the charge on the electron), 
which implies C= 2, for a magnetic field exceeding 0.4 tesla. The correlated Chern 
insulator is ferromagnetic, exhibiting substantial magnetic hysteresis anda large 
anomalous Hall signal at zero magnetic field. Our discovery of a C= 2Cherninsulator 
at zero magnetic field should open up opportunities for discovering correlated 
topological states, possibly with topological excitations’, in nearly flat and 
topologically nontrivial moiré minibands. 


Moiré superlattices in van der Waals heterostructures have emerged 
asa powerful tool for engineering quantum phenomena, because the 
periodic moiré potential defines new length and energy scales* 
Notably, nearly flatelectronic bands can be realized in different moiré 
superlattice systems, which offer exciting opportunities to realizea 
wide variety of correlation physics®, For example, correlated insula- 
torsandsuperconductivity have been reported in magic-angle twisted 
bilayer graphene" and in ABC-TLG/hBN moiré superlattices", and 
spontaneous ferromagnetism and an anomalous Hall effect, apparently 
corresponding to an incipient Chern insulator, have been observed 
in twisted bilayer graphene with an aligned hBN layer, Recent theo- 
ries suggest that correlated topological phenomena could emerge in 
such graphene moiré superlattices, whereanon-trivial band topology 
coexists with the nearly flat moiré miniband**"”, Pristine ABC-TLG, 
because ofits cubic band and therefore a rather flat dispersion atlow 
energy, can already exhibit strong correlations” and can potentially 


hostspontaneous quantum Hall states”. Themoiré superlattice in ABC- 
TLG/hBN heterostructure further creates isolated flat moiré minibands, 
which enhances the electron-electron correlation and topological 
effects in the system. This ABC-TLG/hBN heterostructure provides a 
particularly attractive platform with which to explore correlated topo- 
logical phenomena because not only the electron density butalsothe 
bandwidthand topology of the moiré minibands can be conveniently 
controlled by electrostatic gating**™. 

Here we report experimental observation of a correlated Chern 
insulator and ferromagnetism in ABC-TLG/hBN. Upon tuning the 
vertical displacement field, we show that the magneto-transportin 
an ABC-TLG/hBN moiré superlattice exhibits distinct behaviours for 
trivial minibands (C=0) compared to topological minibands (C#0). A 
correlated Cherninsulator with C=2 quantum anomalous Hall effect” 
emerges around1/4 filling of the topological holeminiband when the 
bandwidths sufficiently narrowed by applying a displacement field 
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Fig.1| ABC-TLG/hBN moirésuperlattice and tunable Chern bands. 
a, Schematic of the dual-gated ABC-TLG/hBN moiré superlattice Hallbar device 
and measurement configuration. The inset shows that the moiré pattern exists 
between ABC-TLG and bottom hBN.b, Colour plot of the longitudinal 
resistivity p..asa function of V.and V,at T=1.5K. The arrowsshow the 
direction of changing dopingn and displacement field D, respectively. tn 
addition to the band insulating states (characterized by the resistance peaks) at 
the chargeneutral point (CNP) and fully filled point (FFP), tunable correlated 
insulator states also emerge at 1/4fillingand 1/2Fillingof the hole minibands at 
large displacement field |D|.Ithas been predicted theoretically" that the hole 
miniband is topological (thatis, Chern number C0) for D<O and trivial (C=0) 
for D> 0. Theinset shows the optical image of the device. 


inone direction. The correlated Cherninsulator spontaneously breaks 
time-reversal symmetry, exhibiting strong ferromagnetic hysteresis 
andazero-field anomalous Hall resistance over 8 kQ. The experimen- 
tally observed C=2 Chern band can be understood theoretically by 


incorporating electron-electron interaction effects on the quasi- 
particle band structure of ABC-TLG/hBN moiré minibands. 

We fabricated two ABC-TLG/hBN moiré superlattice devices follow- 
ing the method described in ref. "In brief, the ABC-TLG domain is 
identified by scanning near-field infrared nanoscopy (Extended Data 
Fig. 1), and is isolated from adjacent ABA domains by atomic force 
microscope cutting”. The isolated ABC-TLG is then encapsulated in 
exfoliated hBN crystals, where one hBN crystalisaligned with the ABC- 
TLG to form the moiré superlattice. The ABC-TLG/hBN heterostructures 
are fabricated into a Hall bar geometry with one-dimensional edge 
contacts, a metal top gate, and a degenerately doped silicon bottom 
gate following standard nanofabrication procedures™. A schematic 
image and an optical image of device I (data shown in the main text is 
from device |) are shown in the insets to Fig. 1a, b (and Extended Data 
Fig. 2). Gate voltages V, and V, are applied to the metal top gateand the 
Sibottom gate, respectively. The dual-gate configurationallowsusto 
independently control the dopingand the miniband bandwidth of the 
ABC-TLG/hBN heterostructure* ”: the doping relative to the charge 
neutrality pointisset by n= (D,~D,)/e, and the miniband bandwidth 
istuned by the applied vertical displacement field D = (D,,+ D,)/2. Here 
Dy=*+ &o(My— VB)/dy and D, =~ e,(V;- V9)/d, are the vertical displace- 
ment fields below and above the ABC-TLG/hBN moiré superlattice, 
respectively, ,,)and d,,, are the dielectric constant and thickness of 
the bottom (top) dielectric layers, andV/,. is the effective offsetin the 
bottom (top) gate voltages caused by environment-induced carrier 
doping. The longitudinal resistivity p,. is obtained by p,, = (W/L)V,./I, 
where W/=1 mis the channel width and =4 wmis the channellength, 
and the Hall resistivity p,, is obtained by p,, = V,,/I (the measurement 
configuration for the longitudinal and Hall voltages, V..and Vj, respec- 
tively, is shown in Fig. 1a). 

Asthevoltagesapplied to the gates are tuned, measurements ofp, 
reveal several resistance peaks (Fig. Ib) inthe ABC-TLG/hBN device 
across the parameter space controlled by V,and V,. In addition to the 
peaks correspondingto band insulating states atthe charge neutrality 
point and fully filled point, tunable correlated insulator states emerge 
at1/4 filling and 1/2 filling of the hole miniband (that is, one and two 
holes per moiré unit cell) whena finite displacement field |D| narrows 
the moiré minibands. There are twoapparent asymmetries of the cor- 
related insulator states in ABC-TLG/HBN at 1/4. and 1/2 charge filliny 
between the electron and hole minibands, and between positive and 
negative D. Prominent correlated insulator states are observed in the 
hole minibands but notin the electron minibands because the hole 
miniband hasamuch smaller bandwidth for finite [D|(ref."). Theasym- 
metry between the positive and negative D fields arises from the fact 
that the moiré superlattice exists only between the ABC-TLG and the 
bottom hBN in this device (Fig. 1a). Interestingly, the direction of the 
displacement field has been predicted to determine not only the rela- 
tivebandwidth but also the topology of the holeminiband. Fora device 
with moiré superlattice between ABC-TLG and the bottom (top) hBN, 
a positive (negative) D leads to a trivial hole miniband with C=0 and 
smaller bandwidth, whilea negative (positive) D leads toa topological 
hole miniband with C# 0 and larger bandwidth**. The difference in 
bandwidth and topology isreflectedin the electrical transportbehav- 
iour: in the device with the moiré superlattice at the bottom hBN, we 
observe stronger correlated insulators in the trivial hole miniband with 
positive D, because holes are easier to localize in the narrower trivial 
band than nthe broader topological band with negative D (ref.”).Our 
previous studies have shown that superconductivity can emerge when 
we dope the 1/4 filling correlated insulator state inselected parameter 
spaces of the trivial hole miniband". 

Tobetter probe the topological aspects of the moiré minibands, we 
turn to magneto-transport studies. AtD = 0, the correlation effect in 
the systemis relatively weak, and the magneto-transport data exhibit 
well defined quantum Hall states and a Landau fan diagram at low 
magnetic fields (see Extended Data Fig. 4), demonstrating the very 
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Fig.2| Quantum Hall effect from the correlated C=2 Chern insulator. 

a,¢, Colour plot ofp..(a)and p,,(c) asa function of carrier density and 
‘magnetic ield for the topological hole miniband atD=-0.5Vnmand 
T=0.06K. The experimental data at T=1.5Kare qualitatively similar. 

bd, Corresponding. (b)andp,.(d) plots for the trivial hole miniband at 
D=0.4Vnm‘*and T=1.5K.n, corresponds to the carrier density of the 1/4 filling 
ofthe first miniband, No quantum Hallsignaturesare presentin the trivial hole 
‘miniband, whereasa v=2 quantum Hall effect characterized by aminimum of 


high quality of our sample. With large displacement field, the moiré 
miniband bandwidth becomes narrower, and the dominantelectron- 
electron interaction dramatically changes the magneto-transport 
behaviour. Figure 2a, b displays the colour plot of p,.as function of 
the hole dopingand the vertical magnetic field B forD=-0.5V nm "(at 
T=0.06 K)and D=0.4V nm‘ (at T=1.5 K), respectively. Figure 2c, d 
shows the corresponding Hall resistivity p,. data. Experimental data for 
D=-0.5Vnm" at 5K exhibit qualitatively similar behaviour to thoseat 
0.06 (see Extended Data Fig. 3). We have used y=5.25%10" cm asthe 
unitof carrier density, which correspondsto one hole per moiré lattice 
site (that is, 1/4 filling). The magneto-transport data exhibit distinct 
behaviours for the topological moiré miniband at negative D and the 
trivial miniband at positive D (refs. *). Specifically, a strong quantum 
Hall state emerges from the 1/4 filling point for D=-0.5 V nm" but not 
for D=0.4 Vm‘. The dashed line in Fig. 2a traces the minimuming,. 
followingthe relation =veB/h for v=2. This quantum Hallstateis well 
developed atvery low magnetic fields, and originates from the 1/4 fill- 
ing resistive state at zero magnetic field (Fig. 2a). Atthe sametime, pj. 
is very large at weak magnetic fieldsand exhibits ajump in value when 
the magnetic field switches sign across B= 0 T (Fig. 2c). By contrast, 
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Bm 
panda quantized p, emerges from 1/4 filling of the topological hole 
miniband.e, f, Horizontal line cuts of and c, respectively. eshows that pis, 
well quantized beyond B=0.4T. An offset of2.5kQ is applied for eachstackinf. 
g, Line cutofp..and ,.along the quantum Hall state (denoted by the dashed 
linesinaande) shows that p,,feaches a quantized value of v=2at0.4 T,anda 
large p,, persists down to zero field. Itrepresents a quantum anomalous Hall 
state for theC=2 correlated Chern insulatorat!/4 filling. The inset showsa 
zoomed-in plotofp,, atsmall magnetic field. 


stronger correlated insulator states are observed for D=0.4V nm", 
but no signatures of quantum oscillations or quantum Hall effects 
are present (Fig. 2b). In addition, Fig. 2d shows that the Hall resistivity 
signal tends to be rather small forall holedopingatD=0.4V nm". (The 
relatively large p,, Signals at 1/4 and 1/2 fillingsare artefacts caused by 
crosstalk from the large p,, of the correlated insulator states, and they 
donot change sign when the magnetic field is reversed.) 

Figure 2e, fshows,,and p..as a function of density for afew repre- 
sentative magnetic field values, corresponding to horizontal line cutsin 
Fig. 2canda, respectively. p,.is well quantized for magnetic field larger 
than 0.4 Tat the value of 13.0 + 0.2kO, thatis, the expected quantized 
value of h/2e? = 12.9kMis within the empirical uncertainty... exhibits 
a corresponding minimum in the quantum Hall state, with aminimum 
resistivity less than 60 Qat2T. Figure 2g further displaysp,.andp,,asa 
function of the magnetic field along the quantum Hall tate following 
thedashed linein Fig. 2c, with the insetshowingazoomed-in plot ofp, 
between 0 to 0.2 T. p,, smoothly reaches the quantized value at 0.4 T. 
Pc maintains a large though not quantized value all the way to zero 
magnetic field, and alarge jump of p,.is observed when the magnetic 
field changes sign. 
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Fig.3| Anomalous Hall effect and ferromagnetism. a, Magnetic-field- 
dependentp,.at1/4 filling and D=-0.5 V nm ‘at different temperatures. The 
Hall resistivity displaysa clear anomalous Hall signal withstrong 
ferromagnetic hysteresis. At the base temperature of T=0.06 K, the anomalous 
Hallsignal can beas highas p\!'=8kQand the coercive ieldis 8, =30 mT. The 
inset shows the extracted coercive field .and anomalous Hall signal pasa 


The v=2quantum Hall state at /4 fillingat D =-0.5 Vnm*cannotbe 
explained by aconventional integer quantum Hall effect from single- 
particle Landau levels. Instead, we argue thatit represents a quantum 
anomalous Hall state from acorrelated Chern insulator. First, thisquan- 
tum Hall state only exists atnegativeD, wherethe minibandis predicted 
to have anon-trivial Chern number, and is absentat the positive D, 
where the bands predicted to be trivial. Second, itis well established 
that the lowest single-particle Landau level in ABC-TLG should be a 
v=3 state owing to a winding number of 3 close to the valence band 
maximum”, Third, only one quantum Hallstateis observed anywhere, 
and the quantized Hall resistivity appears to startat very low magnetic 
field. Ifthe observed quantum Hall state of 1/4 fillingat D=-0.5 Vam* 
is from the lowest single-particle Landau level, similar Landau levels 
should also exist close to the charge neutrality pointand the1/2 filling 
correlated insulators, and higher Landau levels should be observable. 
(See Extended Data Fig. 4 fora single-particle Landau fan diagram in 
the same device, where D = 0 and the electron correlation is weak.) 
Finally, an apparentnon-zero quantum Hall-like gap was observed for 
the1/4 filling Chern insulator state down to B=0T; the size of the gap 
continuously increases with increasing B (see Methods and Extended 
Data Fig. 5). Allour data canbe naturally explained byav=2Cherninsu- 
lator state at 1/4 filling. Sucha C=2.correlated Chern insulator should 
feature quantized Hall resistivity p,. and a corresponding magnetic 


function of temperature. b, The evolution of p,,,B.and p'Y'as afunction of hole 
dopingatD =-0.5V nm, T=0.06K. Thestrongest anomalous Hall signalis 
observed close ton=ny.¢, The evolution of p..,B-and p*'asa function of the 
displacement field Datn=n,, T=0.06K. The strongest anomalous Hall signalis 


observed when the deviceis most insulating (thats, largest p..). 


field dependent carrier density based on the Streda formula”. This, 
Chern insulator at 1/4 filling isa strongly correlated state that breaks 
the valley degeneracy and fills only the C=2 electronic band in one 
valley. The nearly flat and tunable moiré minibands in the ABC-TLG/ 
ABN moiré heterostructure are critical for the realization of sucha 
correlated topological state. 

The correlated Chern insulator, persisting to zero magnetic field, 
spontaneously breaks the time-reversal symmetry and can generate 
valley-flavour ferromagnetism at 1/4 filling. Indeed, ferromagnetism 
and strong anomalous Hall signals emerge from the Chern insulator 
state atzero magnetic field. Figure 3a shows the temperature-depend- 
ent Hall resistivity when asmall perpendicular Bisswept between -0.1T 
and 0.1T. The Hall resistivity displays a clear anomalous Hall signal 
with strong ferromagnetic hysteresis. AtB= 0 T, p,,is non-zero and 
depends on the magnetic field sweep direction, a defining ferromag- 
netic feature. At the base temperature of T= 0.06 K, the anomalous 
Hall signal reaches a maximum of p= 8 kQ and a coercive field as 
largeas B,=30 mT. Theinsetin Fig. 3ashows the temperature depend- 
ence of p““and 8: both signals decrease monotonically with increas- 
ing temperature, reaching zeroat T=3.5K. The zeromagnetic field p*# 
isalready close to 12.9 kQ. An almost perfect quantization of the C=2 
quantum anomalous Half" Chern insulator appears atamagneticfield 
aslowas 0.4. 
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Fig. 4| Calculated Chern number including the electron-electron 
interaction effects.a,b, Calculated single-particleband structure of the ABC- 
TLG/hBN moiré superlattice for ®=-25 meV and 25 meV, respectively. Here Dis 
the energy difference between the top and bottom layers of ABC-TLG, and 

25 meV correspondsto the vertical displacement field around 

5V nm’, The redline highlights the topological hole miniband for 


The ferromagnetism is tunable by nand D and appears only in alim- 
ited parameter space of n and D. In Fig. 2e, p,.near n= presents dif- 
ferent signs at B=~4 mT and 6 mT, which is much smaller than B,. For 
this measurement, the magnetic field is fixed and carrier density is 
swept from anon-ferromagnetic stateto a ferromagnetic state, which 
leads to p,, with different signs even in small positive and negative 
magnetic fields. A clearer n-dependence of p,,, B. and p™ at 
D=-0.5Vnm*atthebase temperature isshown in Fig. 3b by sweeping 
the magnetic field at different fixed. 8. and p*both have maximum 
values close ton =n). However, px'shows astronger carrier doping, 
dependence and decreases to almost zero at n= ng £0.27, while B, 
decreases to zero at n= ny+ 0.35 Ny. px. displays an unusual behaviour 
with both a resistance peak and a resistance dip close to n= ny, the 
origin of which requires further experimental and theoretical study. 
Figure 3cshows the D-dependence of p.,,B.and pat n=o.p.:Shows 
amaximum at D=-0.5 V anv, which might be due to the narrowest 
bandwidth and strongest correlation effectsatthis displacement field”. 
p¥lalso shows a maximum at D=~0.5 Vnm', suggesting the impor- 
tance of electron-electron correlations to the observed anomalous 
Hall signal. A finite p'can be observed with D between -0.3 Vnm* 
and-0.57V nm‘, Anon-zeroB. is presentin the sameD range, although 
the maximum B. appears at D =-0.45V nm". 

The observed C=2 correlated Chern insulator can be understood 
theoretically from the topological moiré minibands when the electron- 
electron interactionsare considered. Previous theoretical calculations 
predictavalley Chern number C=3 for the single-particle holeminiband 
for negative D (refs. *), but our results suggest thatinteraction effects 
can renormalize the valley Chern number. Figure 4a, b shows thesingle- 
particleband structures of the lowest few moiré minibands in ABC-TLG/ 
BN moiré superlattices for positive and negative displacementfields. 
For thenegativeD values (supportinga non-zero valley Chern number), 
the valence band overlaps with the remote lower band (see Fig. 4a). We 
incorporate the interaction effects in Hartree-Fock theory. When the 
valence bandis close to the band below (atlarge|D|) or when theinter- 
action strength is sufficiently strong (with small dielectric constant), 
the self-energy corrections mix the valence band and the lower band, 
leading toreduction of the Chern number to C=2. Asshownin ig. 4c, 
when the dielectric constantisaround 4 (effective screening fromthe 
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‘= -25 meV.c, Calculated Chern number of the hole minibandasa function of 
the energy difference @ andthe effective dielectric constant éyayafter 
including the electron-electroninteractioneffects using the Hartree-Fock 
approximation. The resulting band Chern number canbe 2for parameters 
close tothe experimental device where @=-25 meV and éyu,=4- 


dielectric constant of MBN), the valley Chern numberis expectedtobe 
2 for a large range of displacement field values. 

Our observation ofa tunable C=2 Chern insulator in ABC-TLG/hBN 
moiré superlattice provides an opportunity to explore correlated topo- 
logical states in van der Waals moiré heterostructures. For example, 
fractional Chern insulatorsandnon-Abelian states could emerge from 
strong correlations innearly flat topological minibands once thequal- 
ity of moiré heterostructures is further improved. In particular, the 
flat C=2 Chern band has the potential to host novel fractional Chern 
insulator states beyond the fractional quantum Hall paradigm”. 
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Methods 


Transport measurements 

The ultralow temperature measurements performed in a dilution 
refrigerator. Low-temperature electronicfiltering, including microwave 
filters, low-pass resistor-capacitor filters, and thermal meanders, is 
used to anchor the electron temperature as well as to prevent qua- 
siparticle excitations from high-frequency noise. Stanford Research 
‘Systems SR830 lock-in amplifiers with NF Corporation LI-75A voltage 
preamplifiers are used to measure the resistivity of the device with an 
alternating-current bias current of 0.5 nA at a frequency of 7 Hz. 


The nature of the correlated insulator at 1/4 filling for the 
topological side 

Because of the Wannier obstruction caused by the valley Chern num- 
ber, a standard Mott insulator with localized charge is impossible”. 
Therefore, the physics on the topological side is essentially different 
from that on the trivial side, despite the similarity between the band 
structures. Because a narrow Chern band is analogous to a Landau 
level, the physics on the topological sideis similar to that of quantum 
Hall systems with spin and valley degeneracies. At1/4 filling, when the 
band is sufficiently flat, a single fully filled spin- and valley-polarized 
Chern bandis favoured®, similar to the “quantum Hall ferromagnetism’ 
found in Landau levels”. A valley-polarized Chern insulator matches 
the currenttransportexperiment quite well. Ideally, a fully filled Chern 
band leads to quantized Hall conductivity o,,=2e’/h. Atzero magnetic 
field, domains formed by thetwo degenerate valleys can causea finite 
x. on the order of h/e”. Upon increasing the magnetic field to about 
0.4, the valley Zeeman coupling” canalign the domains and lead to 
perfect quantization of the Hall conductivity. 

Ferromagnetism and an anomalous Hall effect have also recently 
been observed in a near-magic-angle twisted bilayer graphene moiré 
superlattice at 3/4 filling of the conduction miniband, where an hBN 
cladding layer appears to be aligned with the proximate graphene". 
These effects are likely to havea similar origin in both systems, where 
electron-electron interactions createa spontaneous valley polarization 
in thenearly flat and topological moiré minibands®”*, However, the Hall 
resistance in the incipient Chern insulator state of the twisted bilayer 
graphene device of ref. “is not quantized even under a finite applied 
magnetic field, possibly owing to substantial twist-angle disorder”. 


Calculation for ABC-TLG/hBN moiré superlattice 
Asimple argumentin ref. ° shows that the valley Chern number must 
jump by 3when switching the direction of the displacement field ifthe 
super-lattice gap does not close during this process. A direct numerical 
calculation gives C=3 for theD <Oside. Thus, owing totheinteractions, 
asymmetry breaking state with spin-valley polarization is stabilized, 
and then a Chern insulator with C=3 is expected. We first improve 
our theoretical modelling of the band structure by adding various 
remote-hopping terms at the single-particle level and incorporating 
interaction effects, as shown in Extended Data Fig. 6. As wewill show, 
C=3is still robust in our more sophisticated model of single-particle 
band structure. Interaction effects turn out to be necessary to explain 
the reduction of the Chern number. 

The ABC-TLGis modelled by asix-band model. Weuse the following 
parameters”: (V, y, Yo, ¥s, ¥4)= (2,676, 380, 8.3,260, 104) meV. 

Then the Hamiltonian for the valley + is: 
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Thealigned hBN layer in the bottom provides a moiré hopping term 
which folds the original band structure to a small mini Brillouin zone. 
Weuse the same model of the moiré hopping termas in ref.”. 

Using the above model we get the band structures shown in Fig. 4. 
For @y=-25meV (where subscript Vindicates vertical’) we getanarrow 
valence band with Chern number |C|=3. We have tried to change the 
various hopping parameters and the potential difference. However, the 
Chern number is always equal to 3 and we conclude that C=2 cannot 
be reproduced at thesingle-particlelevel. 

InFig. 4b, wecan see thatalthough ateach K-point the valenceband 
isisolated from the band below, they overlap inenergy (in other words, 
although thereisa direct gap thereisnoindirect gap). More precisely, 
thesystem isinacompensated semimetal phase at the fully filled point. 
This isin agreement with the experimental measurement. On the D < 
Oside, electrons are pushed away from thealigned bottom hBN layer. 
Therefore the moiré superlattice potential hasa weaker effect and the 
superlattice gap is small. 

Given that the valence band is not isolated from the remote band 
below, an interaction induced self-energy can renormalize the band 
structure and maybeeven the band topology. Toincorporate this effect, 
we performaself-consistentHartree-Fock calculation by keeping only 
the valence band and the remote band. Theinteracting Hamiltonians 
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where a=+is the valley index and m =0,1is theband index, labelling 
the valence bandand the remote band.o=*, Listhespinindex. cy, o:(K) 
is the creation operator corresponding to the band m for the valley a 
and spin g. The two terms in the Hamiltonian are the kinetic term and 
the interaction term. V(q) isthe screened Coulomb interaction con- 
trolled by the renormalization factor of the dielectric constant. Inthe 
interaction we have included the form factors A,.,t0incorporate the 
Berry curvature of the Bloch wavefunctions*. Then Hartree-Fock self- 
energy can be obtained from the self-consistent equations: 
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Wesolvethese equations by iterating from_zero initial values. Thenwe 
add the self-energies to calculate the new Chern number. The result 
is summarized in Fig. 4c. When the dielectric constant is large, the 
Chern numberis3, the sameas the non-interacting case. For a fixed 
displacement field, increasing the interaction strength (decreasing 
the dielectric constant) can reduce the Chern number to 2 through 
a topological transition. For a large parameter region, C=2is indeed 
expected, consistent with the current experiment. 

Finally, we discuss the nature of the observed Chern insulator. At 1/4 
filling, there is one particle per moiré unit cell. Within Hartree-Fock 
theory®, the most natural ground stateisa spin- and valley-polarized 
Cherninsulator with C=2, whichis consistent with the transportmeas- 
urement. 

Because the two valleys are degenerate at zero magnetic field, 
domains can exist and cause the non-quantization of the Hall con- 
ductivity owing to the chiral edgemodein the domain boundary. Upon 
increasing the magnetic field, the valley Zeeman couplingcanalign the 
domains and lead to perfect quantization. In the current experiment, 


the quantization isachieved at only 0.2, which suggests that the val- 
ley Zeeman coupling is large, consistent with previous theoretical 
calculations”, 

The anomalous Hall effectin Fig. 3 strongly suggests thatthe valleyis 
polarized. Although the simplest ansatz within mean field theory also 
requires spin polarization, the present transport measurement cannot 
rule out more exotic Chern insulator phases with the spins ina disor- 
dered (for example, ‘spin liquid’) or anti-ferromagnetic phase. Even 
fora simple spin-polarized scenario, non-trivial topological defects 
in spin space may havean important role. The skyrmion excitation 
carries charge Q=2ein the Chern insulator and may be the cheapest 
charge excitation (at small field the cheapest charge excitations may 
also be valley flips). In this case the activation gap is decided by the 
skyrmion gap. The existence of skyrmions may be reflected in a large 
gfactor (skyrmions involve many spin flips) for response of activation 
gap to magnetic field. We leaveit to future experiments to probe these 
possible interesting physics associated with thespin texture. 


Insulating behaviour and extracted gap of the Chern insulator 
state 

We measured the magnetic-field-dependent energy gap of the v=2 
state. Extended Data Fig. 5 shows the temperature dependent longi- 


tudinal resistivity p,,and conductivity o,, =p, H(o2, + 0. )ataiferen 
magnetic field B for the v=2 state. For all magnetic fields, the tem- 
perature dependencies in Extended Data Fig. Sa, b exhibit typical 
behaviours of the quantum Hall insulator in that both p,.and o. 
decreases with decreasing temperature, which differs froma trivial 
insulator behaviour where the decrease in ¢,,is accompanied by a 
divergence in p.,. This quantum-Hall-like behaviour persists at zero 
magnetic field. Extended Data Fig. 5c showstheenergy gapatdifferent 
magnetic fields, obtained by fitting the data toan Arrhenius activation 
model ofo,, « e448”, The deviation from the Arrhenius behaviour at 
the low- and high-temperature limits is possibly due to the variable 
range hopping at low temperatures and strong thermal excitations at 
high temperatures. We extract a non-zero energy gap of about 2K at 
B=0.Wecanclearly seeacontinuousincrease in gap size withincreas- 
ing the magnetic field from B= 0, and the non-zero intercept at B= 0. 
Itindicates that quantum Hall behaviour extends to zero magnetic 
field, consistent with the Chern insulator state. 


Ferromagnetismand Chern insulator in asecond device 
Similar ferromagnetism and Chern insulator data have been observed 
in a second ABC-TLG/hBN device. Extended Data Fig. 7 shows the 
basic characterization of the second device (device Il, Extended Data 
Fig. 7a). The moiré exists between the top hBNand ABC-TLG for device 
II (Extended Data Fig. 7b, opposite to that of device lin the main text). 
This leads to a non-trivial band at positive displacement fields and 
atrivial band at negative displacement fields. Extended Data Fig. 7c 
shows the Mott insulating states at 1/4 and1/2 fillings of thetrivial band 
for negative displacement fields, and weak resistance peaks of the 
non-trivial band for the positive displacement fields. 

By tuning the gate voltages to the non-trivial side at the positive 
displacement field near 1/4 filling, we reproduced the main data of 
device I in the main text. Extended Data Fig. 8 shows the main results 


for device ll. AtD = 0.55 V nm, an anomalous Hall hysteresis loop is 
clearly resolved. We show then- and D-dependence of the anomalous 
Hall resistivity p**and the coercive field B- in Extended Data Fig. 8b. 
The fans of p..and p,.in Extended Data Fig. 8c, d show the clear v 
state developing from 1/4 filling, as represented by the dashed lines. 
(The contact resistances are much larger in device Il, which leads to 
much larger measurement noise. Italso prevents us from measuring 
the magneto-transportatthe lowest dilution fridge temperatures owing 
to theincreased contact resistances at low temperature.) 


Data availability 


The data that support the findings of this study are available from the 
correspondingauthors upon reasonable request. 
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Extended Data Fig. 1 Identification of ABC-TLG.a, Atomic force microscope topography image ofan exfoliated TLG onSi0,/Si.b, Near-field infraredimage 
corresponding toa, showing that ABC-TLGhas different contrastto ABATLG. 


Extended Data Fig.2 | Optical images of device during fabrication. a, ABC-TLGisidentified by near fieldinfrared spectroscopy and isolated by atomic force 
microscopetip.b, ABC-TLGisencapsulated by hBNand etched into Hall bar geometry.c, Final device with metal contacts and topand bottomgates. 
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Extended Data Fig.3|Magneto-transport ofthe Chern insulator state at 
T=1.5K.a,b, Colourplots of p.and p,.as a function of carrier density and 
magnetic field at D=-0.5V nm and T=1.5K. The v=2Cherninsulatorstateis 


well resolved at SK, which features aminimum for p,,, anda quantized ,, 
emerges from 1/4 illing.¢,d, Horizontal ine cuts ofaand.b, respectively. py. 
shows quantized Hall resistance at finite magneticfield. 
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Extended DataFig.4|Landaufanat D=0. Longitudinal resistivity p,.(colour _ quantumLHallstate of the chargeneutrality pointis v=6. ThisLandau fan 
scale) asa function of carrier density and magnetic field at displacementfield _diagramestablishes conclusively that we have ABC trilayer graphene in the hBN 
D=0,ClearLandau levels develop from the charge neutrality point and fully encapsulated device; itiscompletely different from the Landau fan diagram of 
filled points at D=0, which is direct evidence of the high quality of the ABAtrilayer graphene (see ref."). 
encapsulated ABC-TLG device described in the main text. The firstresolved 
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Extended Data Fig.5| Temperature dependence of the v=2state.a-c, 
Arrhenius plot of longitudinal resistivity (a), conductivity (b) and the 
estimated gap at different magnetic field (¢). A manual ffset of -0.1S onthey 
axisisappliedto each curveinaand.. The gap size incisextracted fromthe 
linear fitofo,, « e-4/*” (red line) inb. Wenote that the Arrhenius plotis only 
valid for a limited temperature range, suggesting deviation fromthethermal 


activated behaviour atlow temperatures. Therefore, the estimated gaps have 
relatively large uncertainty. However, the qualitative behaviour is robust: 
insulating behaviour is observed at all magnetic fields, and the quantized Hall 
insulator atfinite magnetic field connects smoothly with the anomalous Hall 
insulator at zero magnetic field, supporting theidentification ofthe stateasa 
Chern insulator. 
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Extended Data Fig. 6 | Illustration of the ABC-TLG/hBN system. The bottom BN layer isnearly aligned with the graphene layers whereasthe oneon topisnot 
aligned. A and Brefer to thetwo sublattices in each of the graphene layers. 
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Extended Data ig. 7|Basiccharacterizations oftheseconddevice(device __ dimensional colour plotofR.,asa function of V,and V,at T=5K. Themoiré 

I).a, Optical image of device Il. The deviceisinastandardHallbargeometry _existsbetween the top hBNand ABC-TLG for device Il, opposite to thatof 

with top and bottom gates. Thescale baris3 um. b, Schematicof the moiré device Lin the main text. This leads toanon-trivial band at positive 


pattern existing between top hBN and ABC-TLG for devicelll.¢, Two- displacement fields and atrivial band at negative displacement fields, 
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Extended Data Fig.8 |Reproducible Chern insulator data for device displacement field (atn=n,) at1.1K.¢,d, Colour plotofp,,and p,,asa function 
Ferromagneticanomalous Hall effect at 1/4 fillingat 0.3Kand1.4K.b, The of carrier density and magnetic field, Dashed lines represent the v=2state. 
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Machine vision technology has taken huge leaps in recent years, and is now becoming 
an integral part of various intelligent systems, including autonomous vehicles and 
robotics, Usually, visual information is captured by a frame-based camera, converted 
into a digital format and processed afterwards using a machine-learning algorithm 
such asan artificial neural network (ANN)!. The large amount of (mostly redundant) 
data passed through the entire signal chain, however, results in low frame rates and 
high power consumption. Various visual data preprocessing techniques have thus 
been developed?” to increase the efficiency of the subsequent signal processingin an 
ANN. Here we demonstrate that an image sensor can itself constitute an ANN that can 
simultaneously sense and process optical images without latency. Our device is based 
onareconfigurable two-dimensional (2D) semiconductor*” photodiode" “array, and 
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the synaptic weights of the network are stored in a continuously tunable 
photoresponsivity matrix. We demonstrate both supervised and unsupervised 
learning and train the sensor to classify and encode images that are optically 
projected onto the chip with a throughput of 20 million bins per second. 


ANNs have achieved huge success as machine-learning algorithms in 
awide variety of fields’. The computational resources required to per- 
form machine-learning tasks are very demanding. Accordingly, dedi- 
cated hardware solutions that provide better performance and energy 
efficiency than conventional computer architectures have becomea 
major research focus. However, although much progress hasbeen made 
in efficient neuromorphic processing of electrical" or optical”? 
signals, the conversion of optical images into the electrical domain 
remainsa bottleneck, particularly in time-critical applications. Imag- 
ing systems that mimic neuro-biological architectures may allow us to 
overcome these disadvantages. Much work has therefore been devoted 
to develop systems that emulate certain functions ofthe human eye”, 
includinghemispherically shaped image sensors” and preprocessing 
of visual data”, for example, for image-contrast enhancement, noise 
reduction or event-driven data acquisition. 

Here, we present a photodiode array that itself constitutes an ANN 
that simultaneously senses and processes images projected onto the 
chip. The sensor performs a real-time multiplication of the projected 
imagewith aphotoresponsivity matrix. Training of the network requires 
setting the photoresponsivity value of each pixel individually. Con- 
ventional photodiodes that are based, for example, on silicon exhibit 
a fixed responsivity that is defined by the inner structure (chemical 
doping profile) of the device, and are thus not suitable for the pro- 
posed application. Other technologiessuch as photonicmixing™ and 
metal-semiconductor-metal detectors**may, in principle, besuitable, 
butthese device concepts bear additional challenges, such as nonlin- 
ear tunability of the photoresponse and bias-dependent (and hence 
weight-dependent) dark current, We have therefore chosen WSe,~a2D 
semiconductor—as the photoactive material. 2D semiconductorsnot 
only showstrong light-matter interaction and excellent optoelectronic 


properties*” butalso offer the possibility of external tunability of the 
potential profilein a device-and henceits photosensitivity-by electro- 
static doping using multi-gate electrodes". In addition, 2D materials 
technology has by now achieved a sufficiently high level of maturity 
tobe employed in complex systems” and provides ease of integration 
with silicon readout/control electronics”. 

Figure laschematically illustrates the basic layout of theimage sensor. 
Itconsists of N photoactive pixelsarranged in a2D array, with each pixel 
divided into M subpixels. Each subpixel is composed of a photodiode, 
which is operated under short-circuit conditions and under optical 
illumination deliversa photocurrentOf fy, = RinE ARP», WHETE Ryn iS 
the photoresponsivity of the subpixel, £, andP, denote the local irradi- 
ance and optical powerat the nth pixel, respectively, and isthe detec- 
tor area. n=1,2,..., Nand m= 1,2, ..., Mdenote the pixel and subpixel 
indices, correspondingly. An integrated neural network and imaging 
array can now be formed by interconnecting the subpixels. Summing 
all photocurrents produced by the mth detector element of each pixel 
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performs the matrix-vector product operation I= RP, with R= (Rin) 
being the photoresponsivity matrix, P=(P,, P,,.... Py)" beinga vector 
that represents the optical image projected onto the chip andI=(h, 
J,, dy)" being the output vector. Provided that the &,, value of each 
detector element can be set to a specific positive or negative value, 
various types of ANNs for image processing can beimplemented (see 
Fig. Ic, d), with the synaptic weights being encoded in the photore- 
sponsivity matrix. The expression ‘negative photoresponsivity’istobe 
understood in this contextas referring to the sign of the photocurrent. 
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Fig.1|Imaging ANN photodiode array. a, Illustration of the ANN photodiode 


array. Allsub pixels with the same colour are connected in parallel to generate M 


‘output currents. b, Circuit diagram of asingle pixel inthe photodiode array.c, 
4, Schematics of the classifier (c) and the autoencoder (@).Belowthe 


Fig.2|Implementation of the ANN photodiode array.a, Schematicofasingle 
WSe: photodiode. The deviceis operated under short-circuit conditionsand 
the photoresponsivity is set by supplyinga voltage pair V,/-V, to the bottom- 
gate electrodes. b, Macroscopic image of the bonded chip on the chip carrier. 
Scale bar, 2mm. First magnification: microscope image of the photodiode 
array, which consists of 3% 3 pixels. Scale bar, 15 jim, Second magnification: 
scanning electron microscopy image of one of the pixels. Each pixel consists of 
three WSe, photodiodes/subpixels with responsivitiessetby the gate voltages. 
Scale bar, 31m. GND, ground electrode.¢, Current-voltage characteristic 


Encoder 


Pp P=P 


Decoder 


illustration of the autoencoder, shownisan example of encoding/decodingofa 
28% 28 pixelletter from the MNIST handwritten digit database. The original 
image is encoded to 9 code-layer neurons and then decoded back into an 


image. 


450-075 0 075 1.50 
Bias voltage (V) 


curve of one of the photodetectorsin the dark (blue line) and under optical 
illumination (red line). See also Extended Data Fig. 2a. The insetshows the gate- 
voltage tunability of the photoresponsivity.d, Schematic illustration of the 
optical setup. Laser lightis linearly polarized byawire-grid polarizerand 
reflected by a spatial light modulator (SLM). Thereflected lightis then filtered 
byananalyser (intensity modulation) and the resultingimage is projected onto 
the photodiode array. e, Microscope images of the 33 pixel letters used for 
training/operation of the network. 
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Fig.3 | Device operationasaclassifier.a, Accuracy ofthe classifier during 
training for varyingartificial noise levels. Animageis accurately predicted 
when the correctneuron has the largest activation. b, Loss function for 
different noise levels during training, The inset shows the initial and final 
responsivity distributions for 0=0.2.¢, Microscope images of the projected 
letters with differentrandom noise levels. The complete dataset obtained over 
30 epochs of trainings shownin Extended Data Fig. 7.d, Average currents for 


Weimplemented two types of ANNs: aclassifier and anautoencoder. 
Figure Ic shows a schematic of the classifier. Here, the array is oper- 
ated as a single-layer perceptron, together with nonlinear activation 
functions thatare implemented off-chip. This type of ANN represents 
asupervised learningalgorithm thatis capable of classifyingimagesP 
into different categories y. Anautoencoder (Fig.1d) isan ANN that can 
learn, in an unsupervised training process, an efficient representation 
(encoding) fora set of images P. Along with the encoder, adecoderis 
trained toattemptto reproduce atits output the original image, P’ 
from the compressed data, Here the encoder is formed by the photo 
diode array itself and the decoder by external electronics. 

Having presented the operational concept of our network, wenow 
come to an actual device implementation. We used a few-layer WSes 
crystal with a thickness of about 4 nm to form lateral p-n junction 
photodiodes, using split-gate electrodes (with a -300-nm-wide gap) 
that couple to two different regions of the 2D semiconductor chan- 
nel (Fig. 2a)!°"". WSe, was chosen because of its ambipolar conduc- 
tion behaviour and excellent optoelectronic properties. Biasing one 
gate electrode at Vand the other at-V,,enables adjustable (trainable) 
responsivities between ~60 and +60 mA W",, as shown in Fig. 2c. This 
technology was then used to fabricate the photodiode array shownin 
Fig. 2b, which consists of 27 detectors with good uniformity, tunability 
and linearity (see Extended Data Figs. 1,2b). The devices were arranged 
to forma 3:3 imagingarray (N=9) witha pixel size ofabout 17 «17 um? 
and with three detectors per pixel (M=3). Theshort-circuit photocur- 
rents /,. produced by theindividual devices under optical illumination 
were summed accordingto Kirchhoff's law by hard-wiring the devices 
in parallel, as depicted in Fig. 1b. The sample fabrication is explained 
in Methods, andaschematic of the entire circuitis provided in Extended 
Data Fig. 3. Each device was supplied with a pair of gate voltages, Ve, 
and -V,, to set its responsivity individually. For training and testing 
of the chip, optical images were projected using the setup shownin 
Fig. 2d (for details, see Methods). Unless otherwise stated, all measure- 
ments were performed using light with a wavelength of 650 nm and 
witha maximum irradiance of about 0.1W cm. Despiteits small size, 
sucha networkissufficient for the proof-of-principle demonstration 
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each epoch for each projected letter, measured during training witha noise 
level of = 0,2. Each graphshowsthe results of a separate experiment, in which 
the letters‘n’ (op),'v’ (middle) and‘z’ (bottom) are projected ontothe chip, and 
three currents~correspondingto'n’(blue),‘v’ (red) and’z’ (green)-are 
measured. Inthe top graph, for example, the output that corresponds tothe 
letter‘n’ (current‘n’) isthe highest, so the ANN determines thatthe projected 
letteris‘n’. 


of several machine-learning algorithms. In particular, we performed 
classification, encoding, and denoising of the stylized letters ‘n’, ‘v’ 
and‘z’ depicted in Fig, 2e. Scaling the network to larger dimensions is 
conceptually straightforward and remainsa mainly technological task. 

To test the functionality of the photodiode array, we first operated 
itasa classifier (Fig, 1c) to recognize the letters ‘ny, ‘v’ and ‘2’. During 
each training epoch we optically projected a set of S= 20 randomly 
chosen letters. Gaussian noise (with standard deviation of a= 0.2,0.3 
and 0.4; Fig. 3c) was added to augment the input data. In this super- 
vised learning example, we chose one-hot encoding, in which each of 
the three letters activates a single output node/neuron. As activation 
function (the nonlinear functional mapping between the inputs and 
the output of a node) for the M photocurrents we chose the softmax 
function@, (1) = e!”*/3(4, e (acommonchoice for one-hot encoding), 
where €=10"° Aisa scaling factor that ensures that the full valuerange 
of the activation function is accessible during training. Asa loss/cost 
function (the function to be minimized during training) we used the 
cross-entropy £=~ 7; Yiy-.),, log [@,(!) |, where y,,is the label and 
is the number of classes. The activations of the output neurons 
represent the probabilities for each of the letters. The initial values of 
theresponsivities were randomly chosen from a Gaussian distribution, 
as suggested in ref.”, and were different for the supervised-and unsu- 
pervised-learning demonstrations. The responsivities were updated 
after every epoch by backpropagation” of the gradient of the loss 
function 
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with learning rate q=0.1. Adetailed flow chart of the trainingalgorithm 
is presented in Extended DataFig. 4d. 

InFig. 3a, b the accuracy and lossare plotted over 35 training epochs. 
The lossis decreasing quickly for all noise levelsand reaches aminimum 
after 15,20 and 35 epochs for = 0.2, = 0.3 and a= 0.4, respectively. 
‘Theaccuracy reaches 100% for all noise levels, with faster convergence 
for less noise. In Fig. 3d we show the mean currents for each of the three 
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Fig. 4| Device operation as an autoencoder. a, Loss of the autoencoder during, 
training. The complete dataset of 30 epochs of trainingis given in Extended 
Data Fig. 7.b, Responsivity and weight distributions before (initial) and after 
(final) training, ¢, Autoencoding of noise-free letters. The encoder translates, 


letters during each epoch for 0=0.2 see Extended Data Fig. Sc, dforthe 
other cases). Thecurrents become well separated after about 10 epochs, 
with the highest current corresponding to the label of the projected 
letter. The insetin Fig. 3b shows histograms for the (randomly chosen) 
initial and final responsivity values for 0=0.2 (see also Extended Data 
Fig. Sa,b). Therobustness and reliability of the classification results of 
theanalogue vision sensor were verified by comparison of the accuracy 
and loss with computer simulations ofa digital system with the same 
architecture and learning scheme (Extended Data Fig. 6). 

Next, we demonstrate encoding of image patterns with our device 
operating as an autoencoder (Fig. 1d). We chose logistic (sigmoid) 
activation functions for the code neurons §,,(/,,) = (1+ e~*)!, again 
with €=10" A“ asa scaling factor, as well as for the output neurons 
P’ = O,lZ,) = (1+ € 0), where Z,= 0-4 Wan Dy (ln) ANG Way Genotes 
the weight matrix of the decoder. Weused the mean-square loss func- 
tion £=4||P-P’|?, which depends on the difference between the 
original and reconstructed images. The responsivities were again 
trained by backpropagation of the loss according to equation (2), with 
anoise level of = 0.15. Along with the encoder responsivities, the 
weights of the decoder W,, were trained. As shown in Fig, 4a, the loss 
steeply decreases within the first -10 training epochs and then slowly 
converges to a inal value after about 30 epochs. The initial and final 
responsivities/weights of the encoder/decoder are shown in Fig. 4b 
and Extended Data Fig. 8, and the coded representations for each let- 
teraredepictedin Fig, 4c. Each projected letter deliversa unique signal 
pattern at the output. A projected ‘n’ delivers negative currents to 
code-layerneurons1and2anda positive current to code-layer neuron 
3, After the sigmoid function, this causes only code-layer neuron 3 to 
deliver a sizeable signal. The letters ‘v’ and 2’ activate two code-layer 
neurons:’v’, code-layer neurons 0 and 2; 2’, code-layer neurons 1 and 
2. The decoder transforms the coded signal back into an output that 
correctly represents the input. To test the fault tolerance of the autoen- 
coder, we projected twice as noisy (o= 0.3) images. Not only did the 
autoencoder interpret the inputs correctly, but the reconstructions 
were considerably less noisy (Fig. 4d). 

Asimage sensing and processing are both performed in theanalogue 
domain, the operation speed of the system is limited only by physical 
processes involvedin the photocurrent generation”. Asa result, image 


the projected images intoa current code, which isconverted bythe 
nonlinearity into abinaryactivationcodeand finally reconstructedintoan 
image by the decoder. d, Randomly chosen noisy inputs (0=0.3)andthe 
corresponding reconstructionsafter autoencoding. 


recognition and encoding occur in real time witha rate thatis orders 
of magnitude higher than what can be achieved conventionally. To 
demonstrate the high-speed capabilities of the sensor, we performed 
measurements with a 40-ns pulsed laser source (522m, -10 Wem”). 
The photodiode array was operated asa classifier and trained before- 
hand, as discussed above. Wesubsequently projected twoletters(‘v’ and 
‘n’)and measured the time-resolved currents of the two corresponding 
channels. In Fig. Swe plot the electricoutput pulses, which demonstrate 
correct pattern classification within -SO ns, The systemis thus capable 
of processingimages with a throughputof 20 million bins per second. 
This value is limited only by the 20-MHz bandwidth of the used ampli- 
fiers, and substantially higher rates are possible. Such a network may 
hence provide new opportunities for ultrafast machine vision. It may 
also beemployedin ultrafast spectroscopy for the detection and classi- 
fication of spectral events. We also notethat the operation of the vision 
sensor is self-powered (photovoltaic device) and electrical energy is 
consumed only during training. 

Let us now comment on the prospects for scalability. In our pre- 
sentimplementation the weights of the ANN are stored in an external 
memory and supplied to each detector via cabling. Scaling will require 
storing the weights locally. This could be achieved, for example, by 
using ferroelectric gate dielectrics or by employing floating gate 
devices**. To demonstrate the feasibility of the latter approach, we 
present in Extended Data Fig. 9 a floating split-gate photodetector. 
Oncesset, this detector ‘remembers’ its responsivity value and delivers 
a photocurrent of adjustable sign/magnitude. During training, each 
detector could then be addressed by its column and row, using the 
standard infrastructure of active pixel cameras. 

Another important question is the number of required subpixels 
M. As shown in the example in Fig. 1d, a segmentation of each pixel 
into 33 subpixels may be adequate for some applications. Given the 
exponential increase of network complexity with M, increasingtheseg- 
mentation to 6 x 6subpixels would already result inavery powerful ANN 
with a manageable number of 36 analogue outputs. We propose that 
sucha network may also be trainedasabinary-hashing® autoencoder, 
eliminating theneed for analogue-to-digital conversion. Binary hash- 
ing encodes each feature into a binary code of the output signal, which 
means that a 36-bit digital output allows as many as 2 - 1=7 10" 
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Fig. 5| Ultrafast imagerecognition. Projection of two differentletters, v’and 
‘n witha duration of 40 ns, leads to distinct output voltagesof the labelled 
channels. 


encodable features. The implementation ofan analogue deep-learning 
network becomes feasible by converting the photocurrents into volt 
ages thatare then fedintoamemristor crossbar. We finally remark that 
besides on-chip training, demonstrated here, the network can also be 
trained off-line using computer simulations, and the predetermined 
photoresponsivity matrix is then transferred to the device. 

In conclusion, we have presented an ANN vision sensor for ultrafast 
recognition and encoding of optical images. The device concept is 
easily scalable and provides various training possibilities for ultrafast 
machine vision applications. 
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Methods 


Device fabrication 

The fabrication of the chip followed the procedure described in ref.*. 
Asasubstrate we useda silicon wafer, coated with 280-nm-thick Si0,. 
First, we prepared a bottom metal layer by writing a design with elec- 
tron-beam lithography (EBL) and evaporating Ti/Au (3 nm/30 nm). 
Secondly, we deposited a 30-nm-thick Al,O, gate oxide using atomic 
layer deposition. Via holes through the Al,O, isolator, which were 
necessary for the connections between the top and bottom metal 
layers, were defined by EBL and etched with a 30% solution of KOH in 
deionized water. Thirdly, we mechanically exfoliated a -70 x 120 um? 
We, flake froma bulk crystal (from HQ Graphene) and transferred 
it onto the desired position on the sample by an all-dry viscoelastic 
stamping method”. The crystal thickness (about six monolayers, or 
-4nm) was estimated from the contrast under which it appears in an 
optical microscope. Next, we separated 27 pixels from the previously 
transferred WSe, sheet by defining a mask with EBL and reactive ion 
etching with Ar/SF, plasma. Mild treatment with reactive ion etching 
oxygen plasma allowed the removal of the crust from the surface of 
the polymer mask that appeared during the preceding etching step. 
Then, atop metal layer was added by another EBL process and Ti/Au 
nmy32nm) evaporation. We confirmed the continuity and solidity of 
the electrode structure by scanning electron microscopy and electri- 
cal measurements. Finally, the sample was mounted in a 68-pin chip 
carrier and wire-bonded. 


Experimental setup 

‘Schematics of the experimental setup are shownin Fig. 2d and Extended 
DataFig. 4a-c. Light fromasemiconductor laser (650 nm wavelength) 
was linearly polarized before it illuminated a spatial light modulator 
(SLM; Hamamatsu), operated in intensity-modulation mode. On the 
SLM, the letters were displayed and the polarization of the light was 
rotated depending on the pixel value. A linear polarizer withits optical 
axis oriented normal to the polarization direction oftheincident laser 
light functioned asan analyser. The generated optical image was then 
projected onto the sample using a20x microscope objective with long, 
working distance (Mitutoyo). Pairs of gate voltages were supplied to 
each of the detectors individually usinga total of $4 digital-to-analogue 


converters (National Instruments, NI-9264) and the three output cur- 
rents were measured by source meters (Keithley, 2614B). For time- 
resolved measurements, a pulsed laser source emitting -40-ns-long 
pulsesat522nm wavelength was used. The output current signals were 
amplified with high-bandwidth (20 MHz) transimpedance amplifiers 
(Femto) and the output voltages were recorded with an oscilloscope 
(Keysight). For the time-resolved measurements, the analyser was 
removed and the SLM was operated in phase-only mode to achieve 
higher illumination intensities (-10 W cm”). The phase-only Fourier 
transforms of the projected images were calculated using the Gerch- 
berg-Saxton algorithm”. For reliable and hysteresis-free operation, 
the vision sensor was placedin a vacuum chamber (-10° mbar). Alter- 
natively, a protective dielectric encapsulation layer may beemployed 
toisolate the two-dimensional semiconductor fromthe environment. 
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Extended Data Fig.1|Photodiode array uniformity. Gate tunability ofthe responsivities of all27 photodetectors. One of the detector elements (pixel 7, subpixel 
2) did not showany response to light (due toa broken electrical wire), which, however, had no crucial influence onthe overall system performance. 


1003 
zy Pn 
2 ree 
ey 
8 

o14 

0.01 i 


200-400 600 800-~—«1000 
Power density (W m7?) 


Extended DataFig.2| Photodiode characteristics. a, Current-voltage 
characteristic curve under dark (blue) and illuminated (green) conditions. 
The seriesresistance R, and shuntresistance R,,are-10°Q and 10°Q, 
respectively. For zero-bias operation, we estimate anoise-equivalent power of 
NEP=/,/R=10"" WHz", where R= 60 mA W isthe (maximum) responsivity 
and /y,= JKT AF [Rey the thermal noise, where kyisthe Boltzmann constant, 
Afisthe bandwidth and Tisthe temperature. b, Dependence of the short- 
circuit photocurrentonthe light intensity for different split-gate voltages. 
Importantly, the response islinear (/~P), as assumed in equation (1). 
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Extended Data ig. 3| Circuit of the ANN photodiode array. 
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Extended DataFig. 4 Experimental setup. a, Experimentalsetup fortraining the opticalsetup (forschematic see Fig. 24).d, Flow chartof the training 

the classifier and the autoencoder. CW, continuous wave.b, Experimental algorithm. The blue shaded boxesare interactions with the ANN photodiode 
setup for time-resolved measurements. TIA, transimpedance amplifier. Apulse array. 
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Extended Data Fig. 6 | Comparison with computer simulation. Classifier 
training of the analogue vision sensor (solid lines) and simulation of the system 
‘ona computer (dashed lines) for different datanoise levels. The same ANN 
architecture, inputdata, effective learning rate and starting weightshave been 
used. The same accuracy and loss are eventually reached after training. The 
slightly stower convergence of the analogueimplementation compared with 
the simulation reflects the nonidealities (defective subpixel, device-to-device 
variations) of the former. Further discussion on theimpact of nonidealitiesis 


providedin Extended DataFig. 10. 
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Extended Data Fig. 7| Training datasets. a, b, Dataset of 30 epochs of classifier (a) and autoencoder training (b) witha test data noise level of o=0.4and 0=0.15 
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Extended Data Fig. 8 | Autoencoder photoresponsivities/weights. a, b, Initial (a) and epoch 30 (b) encoder photoresponsivity values (left) and decoder weights 
(right), 
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Extended Data Fig. 9 | Floating-split-gate photodiode with memory. 
a, Schematic of the floating gate photodiode. The addition of 2-nm-thick Au 
layers, sandwiched between Al,0, and hexagonal boron nitride (hBN), enables 
the storage of electric charge when agate voltage is appliedto the device, 
acting asa floating-gate memory. b, Electronic characteristic curvesof the 
photodiode operated in p-n, n-pand p-p configurations.c, The ability of the 
device to remember’ the previous configuration canbe verified from the time- 
resolved photocurrent measurement. The measurements performedas 
follows: theback-gate voltages are set to Va.=+5 Vand Vax=-SVandarethen 
disconnected, thatis, there isno longer an applied gate voltage andthe only 
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electric fieldis that generated by the charge stored on the floating electrodes. 
The short-circuit photocurrentis then measured upon optical illumination. 
The lightisthen switched off, at -1,1005, with acorresponding drop ofthe 
photocurrent tozero. After -1,600s, the light isswitched on again, causingthe 
currenttoreachits initial value, and then asmaller value whenthe intensity of 
the lights reduced (-1,7005).After-2,3005, the opposite voltage 
configurationis applied tothe back gates (Va.=-5 Vand Via=+5V),inducinga 
polarity inversion thatalso remains permanent. Now, a positive photocurrent 
(redline) isobtained, 


5 10s 
b 
4 os S 
H 
g3 06 § 
3 = 
. 2 0.4 3 
L oa = 
i 
0 ad 
S360 SO 0050300 
Responsivity/Spitgate voltage (mA/W/V) 
b c = 
925 “tie, 92.5 "HT 
5 be 22.0 tH 
zz z 
Bos tf | Sas } 
Soo Eas 
< 90.5 Ss 90.5 
20.0 
90.0 
a5 
do oi on 03 04 05 ¢ 5 4 20 


Photoresponsivity standard deviaton 


Extended Data Fig. 10 | Robustness of the network. , Detector uniformity, 
extracted from Extended Data Fig. 1. The fitted Gaussian probability 
distribution hasa standard deviation of a= 0,205 40mAW"'V").b, Monte 
Carlo simulation ofa vision sensor with detector responsivitiesof agiven 
standard deviation, (The photodetectors of the actual device havea measured 
photoresponsivity standard deviation of 0.205.) Trained on the MNIST 


Failed subpixels () 


database of handwritten digits, the classifier has 784 pixelsand10 subpixels 
per pixel. For each data point, SO random photoresponsivity variations were 
evaluated.¢, Accuracy dependence onthe number of (randomly chosen) 
defective subpixels. The same ANN and Monte Carlo simulation scheme asin 
bwereused. For each data point, 50 random sets of modified 
photoresponsivities were evaluated. 
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The Hall-Petch relationship, according to which the strength ofametal increases as 
the grain size decreases, has been reported to break down at acritical grain size of 
around 10 to15 nanometres"2, As the grain size decreases beyond this point, the 
dominant mechanism of deformation switches from a dislocation-mediated process 
to grain boundary sliding, leading to material softening. In one previous approach, 
stabilization of grain boundaries through relaxation and molybdenum segregation 
was used to prevent this softening effectin nickel-molybdenum alloys with grain sizes 
below 10 nanometres?. Here we trackin situ the yield stress and deformation texturing 
of purenickel samples of various average grain sizes using a diamond anvil cell 
coupled with radial X-ray diffraction. Our high-pressure experiments reveal 
continuous strengthening in samples with grain sizes from 200 nanometres down to3 
nanometres, with the strengthening enhanced (rather than reduced) at grain sizes 
smaller than 20 nanometres. Weachievea yield strength of approximately 4.2 
gigapascals in our 3-nanometre-grain-size samples, ten times stronger than thatofa 
commercial nickel material. A maximum flowstress of 10.2 gigapascalsis obtained in 
nickel of grain size 3 nanometres for the pressure range studied here. Wesee similar 
patterns of compression strengthening in gold and palladium samples down to the 
smallest grain sizes. Simulations and transmission electron microscopy reveal that 
the high strength observed in nickel of grain size 3 nanometresis caused by the 
superposition of strengthening mechanisms: both partial and full dislocation 
hardening plus suppression of grain boundary plasticity. These insights contribute to 
the ongoing search for ultrastrong metals via materials engineering. 


Understanding the strengthening of nanograined metals has been ne 


puzzling, as mixed results of both size softening and hardening have 


including dislocations”, deformation twinning”, stacking faults", 
grain boundary (GB) migration", GB sliding! and grain rotation". 


beenreported*”. The main challengesin resolving this debate are the 
difficulty in synthesizing high-quality, ultrafine-grained metal sam- 
ples for traditional tension or hardness tests and making statistically 
reproducible measurements. Some researchers have pointed out that 
reported size softening may be related to the preparation of materi- 
als*, Porosity, amorphous regions and impurities may be introduced 
during sample preparation by methods such as inert gas condensa- 
tion and electrodeposition, leading to softening in microhardness 
measurements and tension tests. Another difficulty is identifying the 
dominant plastic deformation mechanisms of nanograined metals. 
Various defects or processes at the nanoscale have been reported, 


Hence, the processes that dominate plastic deformation and thusdeter- 
mine the strength of nanograined metalsare still unclear. 

In this study, we use radial diamond anvil cell (DAC) X-ray diffrac- 
tion (XRD) techniques to track in situ the yield stress and deforma- 
tion texturing of nickel of various grain sizes. We find that mechanical 
strengthening can be extended down toagrainsize of 3nm (thesmallest 
we have available), whichis much smaller than the previously reported 
strongest sizes of nanograined metals. This findingpushesmechanical 
strengthening to the lowest recorded grain size (to our knowledge), 
demonstrating the potential for achieving ultrahigh strengths in 
metals. 
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Fig.1|Sizestrengthening ofnanograined nickel.a, Azimuthally (0-360°) 
unrolled diffraction images of nickel at different pressures. The black arrows 
indicate theaxial compression direction. Each measurementis repeated at 
least ewo times. b, Differential stress versus the lattice strain of nickel 

(see Supplementary Information). We note that for some ofthe data points the 
error bars (standard deviations; see equations (6)-(9) and Supplementary 
Information) aresmaller than the sizes of symbols.¢, Extrapolated yield 
strength of nickel atambient conditions without GB sliding (from EVPSC 


Radial DACXRD experiments (Extended Data Fig. 1, see Supplemen- 
tary Information) were performed atbeamline 12.2.2 at the Advanced 
Light Source, Lawrence Berkeley National Laboratory, andat the Shang- 
hai Synchrotron Radiation Facility. Eight nickel samples with particle 
(grain) sizes ranging from 3 nm to 200 nm (Extended Data Figs. 2, 3) 
were measured. The relatively narrow size distributions allow for the 
investigation of the size dependence of the material's strength. Ina 
sample under uniaxial compression, the stress can be separated into 
hydrostatic and deviatoric stress components. The differential stress 
between the maximum and minimum compression directions can be 
obtained using deviatoric strain theory" (see Supplementary Informa- 
tion). Themeasured XRD peak positions provide information on differ- 
ential strainsas well as differential stresses (Fig, 1a). Plastic deformation 
hasan influenceon deviatoricstrains measured using diffraction, and so 
the differential strain/stress measured with radial DAC XRD can capture 
the transition from elastic- to plastic-deformation-dominant behav- 
iours and can provide information on yield strength, strain hardening 
and so on (Fig. 1and Extended Data Fig. 4). At the same pressure, the 
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simulations) versus grain size. The yield strength of nanograined Cuis 
obtained from molecular dynamics simulations (ref. )and experimental data 
ofnanotwinned Cu (ref.”) and nanograined Ni (refs.*”). The yield strength 
value of nickel in ref.*is taken as one-third ofits hardness. Fornanotwinned Cu, 
drepresents the twin thickness. The inverse Hall-Petch effect hasbeen 
reported for both Cu (refs. "”) and Ni (ref. *). The smallest grainsize of nickelin 
thestudy ofref.*is12nm, 


differential strain of the 3-nm-grain-sized nickel is higher than that of 
larger-grained counterparts. The larger curvatures (the ellipticity of the 
XRD rings, which translates into nonlinearity of thelinesplottedalong 
theazimuth angle) of diffraction lines for smaller nanocrystals indicate 
higher elastic deformationand the greater ability of the material to sup- 
portdifferential stressin thecrystal plane without plastic deformation. 
We used Rietveld refinement implemented in the MAUD software” to 
analyse the differential strain and texture of our samplesat each pres- 
sure. The average differential stress of nickel versus its lattice strain can 
thus be obtained (Fig. 1b) using equations (5) to (9) (see Supplementary 
Information). Toremove the effect of hardening induced by hydrostatic 
pressure, we performed elasto-viscoplastic self-consistent (EVPSC)"* 
simulations (see Supplementary Information) to simulate the stress— 
strain curves of nickel under ambient conditions (Extended Data Fig. 5). 
Thisenables the comparison of our extrapolated strength results atzero 
pressure with those of conventional tests” (Fig, 1c). The stress-strain 
curves (Fig. Ilband Extended Data Fig. 5c) show that instead of softening, 
smaller-grained nickel was stronger than its coarser counterparts, in 
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Fig.2| Inverse pole figures for the texture evolution of nickel with various 
grainsizes. Inverse pole figures showthe probability of findingthe poleto 
lattice planein the compression direction. We notethat nostarting texture 
existsin all raw (uncompressed powder) samples. rand ¢ represent differential 
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strong contrast to the results of previous studies (Fig. 1c). The stress~ 
strain curves of nanograined nickel also showa larger slope/hardening 
exponent (Fig. 1b), possibly owing to the increased plastic anisotropy 
in this smaller grain size (Extended Data Fig. 5). We note that a slight 
strength dropoccursin40-nm-grain-sized nickel inEVPSCsimulations; 
the cause remains to be further investigated. 

The development of in situ deformation textures for nanograined 
nickel with various grain sizes was captured at different strains. As 
shownin Fig. 2, nickel samples with larger grainsizes above 20 nmshow 
very strong deformation textures even atlow strain, Nanograined nickel 
samples with grain sizes below 20 nm exhibit very weak deformation 
textures, indicating that traditional full dislocation activity becomes 
lessactive, whereas the strengthincreases with decreasing grainsize. 
Meanwhile, all of the nickel samples develop a deformation texture, 
indicatingthat deformation mechanisms may still be based on disloca- 
tion slip and twin formation, since GB-mediated mechanisms would 
maintain the initial random textures. 

Previous simulations‘ have suggested that GB deformation plays 
a decisive partin the deformation mechanisms of sub-10-nm-grain- 
sized nanomaterials. Thosestudies proposed that size softening would 
occur asa result of the transition from dislocation-mediated to GB- 
mediated mechanisms. In our experiments, however, we observed no 
size softening but only size strengthening. The uniaxialcompressional 
stress comprises hydrostatic and deviatoric stress components. The 
shear stressarising from the deviatoric stress could potentially activate 
GB mechanisms, whereas the hydrostatic stress of the compression 
increases thecritical shear stress for GB migrationandsliding, thereby 
suppressing those mechanisms; see equation (15). 

To explore the mechanisms for continuous size strengthening, we 
simulated the critical stress for activating full and partial dislocations 
and for activating GB deformation (GB sliding and migration) in 
nanograined nickel. As shown in Fig. 3a, full dislocations are activated 
preferentially and are more dominant than partial dislocations above 
thecritical grain sized’. The dislocation-dominant deformation shifts 
to GB-dominant deformation for grain sizes below a critical grain size 
d?. However, compression has a remarkable effect on this shift. The 
critical stress for activating GB deformation increases with pressure, 
resulting in the critical grain size d being highly pressure-dependent. 
For example, the critical grain size for active GB deformation of nickel 
at>1GPais<2nm (Fig. 3b); this suggests that almostno GBdeformation 
isactivated in our experiments because hydrostatic pressure is higher 
than 1 GPa, that is, the GB-deformation associated softening of 
nanograins has been greatly inhibited during compression. Conse- 
quently, when GB-associated deformation (extrinsic deformation) is 
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stress andlattice strain, respectively. Texturestrengthisexpressedas 
multiples of random distribution, for which 1denotesa random distribution 
anda higher number representsastronger texture. 


suppressed, the material strength should be determined mainly by 
intrinsic deformation properties, which are associated with lattice 
strain and defects in the interiors of grains. 

Itis known that the critical stress to activate dislocations increases 
with decreasing grain size. nasimplified analytical dislocation model 
that considers partial dislocations emitted from GBs of nanograins, 
the critical stress for emitting a full and partial dislocation” can be 
described as: 


() 
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where b, and b, are the Burgers vectors of the full and partial dislo- 
cations, respectively; Gis the shear modulus; yis the stacking fault 
energy; and 6is the ratio of equilibrium stacking fault width to grain 
size. The critical stresses for nucleating both full and partial disloca- 
tions increase sharply as the grain size decreases towards the lower 
limit (Fig. 3a). This leads to the increase of yield strength atsmall grain 
size. Furthermore, partial dislocations are preferentially activated and 
overtake full dislocations below a critical grain size. 

We studied the deformation behaviour of nanograined nickel by 
molecular dynamics simulations” (Fig. 3d). Two types of planar defects 
associated with partial dislocations (thatis, nanotwins and stacking 
faults) as wellas full dislocations are foundin nanograined nickel under 
compression. To explore the deformation mechanisms, we conducted 
transmission electron microscopy (TEM) characterization on therecov- 
ered samples. As expected, high densities of full dislocationsareseenin 
the coarse-grained sample (Fig. 4d). Remarkably, full dislocations are 
prevalentatall average grain izesincludingthefinestat3 nm (Fig. 4a-c), 
although for the 3-nm-grain-sized sample the dislocations were 
observed in grains with slightly larger sizes than average. A detailed 
analysis of the four dislocations observed in the lower part of Fig. 4a 
isshownin thesketch in Fig, 4b based on the Thompson tetrahedron. 
Each of these dislocations is an extended dislocation composed of a 
stacking fault and two partial dislocations lying on {111} slip planes. 
ALLomer-Cottrell lock and a stair rod are formed from the reactions 
of partial dislocations associated with the upper three dislocations, 
with the stair rod lying ona {100} plane. These reaction products are 
immobile and thus provide a strong strengthening effect. A rough 
estimate of the density of full dislocations based on the dislocations 
in Fig. 4a suggestsa density of about 10" m7, which providesastrong 
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Fig.3 | Computational simulation resultsand the modified Hall-Petch 
relationship.a, Comparison of simulated grain-size-dependent critical 
stresses for activating dislocations and GB deformation in nanograinednickel 
atdifferent pressures. The L symbol representsa dislocation. b, Critical grain 
sized? asa function of pressure.c, The predicted yield strength compared with 
the experimental data for nanograined nickel.d, Classic molecular dynamics 


strengthening component for the flow stress. Atthefinestgrain sizes, 
nanotwins form bounded by stacking faults, creating important new 
additions to the deformed structure. These nanotwins further refine 
thenanostructure and contribute to boundary strengthening by con- 
straining dislocation motion. Stepsin thetwin boundariesare observed, 
formingincoherent twin boundaries that contain partial dislocations 
(Fig. 4). We note that the simultaneous and cooperative activation 
of different Shockley partial dislocations on parallel and neighbour- 
ing glide planes may be responsible for these twins™. Stacking faults 
may expand under high stress, increasing their energy and making it 
favourable to form low-energy twins*. Fivefold symmetry twinsare also 
seeninboth 3-nm-grain-sized and 20-nm-grain-sized quenched nickel 
samples (Fig. 4). Fivefold twins may pre-existin the particles or formby 
the successive emission of partial dislocations from incoherent twin 
boundaries with high energy. The non-parallel twin boundaries give 
rise to strong overlapping of associated lattice strain fields, resultingin 
higher yield strength compared tothose without fivefold twinned struc- 
tures®, This resultis consistent with our observationsinthemechanical 
measurements (Fig. 1b). Inshort, twinningand stacking faults observed 
in our TEM measurements originate from the nucleation and motion 
of partial dislocations. This provides compelling evidence thatin the 
sub-20-nm regime of grain size, full-dislocation-mediated deformation 
shiftsto both full and partial dislocations combined with deformation 
winning. 

Our strength measurements (Fig. 1b), computational simulations 
(Fig. 3a, c) and TEM observations (Fig. 4) indicate that a critical grain 
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simulation of 3-nm-grain-sized Nicompressed with 10% volume strain. Green 
indicates partial dislocations associated with stacking faults, twinsand grain 
boundaries. Blue indicates perfect dislocations. Yellow, purpleand red 
indicatea few 1/3 <O01> (Hirth), 1/6-<110>(stair-rod) and othertypes of 
dislocations. 


size (around 20 nm) exists and corresponds to the shift in deforma- 
tion mechanisms from full dislocation to full plus partial dislocation 
mediated deformation. This does not generate a maximum strength 
at the critical grain size but starts astronger mode for strengthening. 
Notably, as shown in Fig. 4, the twins in 20 nm or smaller nickel grains 
areusually only several nanometres thick, but unlike growth twins, no 
softening is induced in pressurized nickel nanograins. Instead, size 
strengthening of nickel is even more pronounced in the smaller size 
range of nanograins. As shown in Fig. 3c, for grain sizes below 20 nm 
the measured yield strength of nanograined nickel largely deviates 
fromthe trend predicted by the traditional Hall-Petch model. Consid- 
ering that the contribution of partial dislocations becomesimportant 
in fine nanograins, we propose a modified Hall-Petch relationship as 
follows: 


(3) 


where o, andd representthe yield strengthand grainsize, respectively, 
and 0g, ky and k, are constants. The first two terms represent the fric- 
tionstress and Hall-Petch formulation associated with full dislocation 
boundaryinteraction, The third termisrelated to the partial dislocation 
contribution to yielding, which isinversely proportional to grain size 
daccording to equation (2). The fitting of our experimental data with 
equation (3) shows that this new model reflects the effects of both full 
and partial dislocations, and can describe the size strengthening of 


Fig. 4| TEM examinations of nickel samples quenched from 40 GPaof three 
grainsizes.a, 3nm;¢,20nm;andd, 200 nm. Panel bisasketch showing the 

analysis of dislocations observed in the lower partof the middle grain in panel 
a, Wenote the reactions of partial dislocations formingLomer-Cottrell locks 


metals overawide size range. Wenote that this fitgivesahigh friction 
stress of about 1.1 GPa, k, of 9 MPa um and alowk, of 101MPaum!com- 
pared to conventionally deformed Ni with 20 MPa and 158 MPa um’, 
respectively””*. To check the generality of the size strengthening for 
nanograined metals, we conducted similar high-pressure deformation 
experiments on nanograined gold and palladium. Asimilar enhanced 
strengthening effect at the lowest grain sizes was observed, which 
indicates that this full plus partial dislocation-mediated strengthening 
iscommonin compressed nanograined metals. 

This size strengthening effect may apply not only to high-pressure 
cases but also provide guidance for applications at ambient condi- 
tions. A recent study’ reported thatatwofold increase in hardness was 
achieved in nanograined Ni-Mo alloys by stabilizing GBs through Mo 
segregation. By using this technique yield strengths ofaround 1.6 GPa 
and 3.8 GPa were achieved in Ni and Ni-Moalloys, respectively. In our 
experiments, an ultrahigh strength of about 4.2 GPaisachieved in pure 
nickel grains. This result suggests that compression is an effective 
method of suppressing GB sliding and migration in order to achieve 
ultrahigh strength. 

Thisisalso supportedby the observation that themeasured strength 
of coarsenickel in our compression testis higher than in conventional 
tension tests”. In real applications, materials could be under either 
tension or compression. Tension tests are common in traditional 
mechanical characterization, However, evaluation of strength by tensile 
loadingis often technically difficultfor nanograined metals especially 
for sub-10-nm grain sizes. Compressive strength measurements using 
radial DAC XRD enables study of the mechanical properties of even 
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andastair rod, Stacking faults (SFs) twin boundaries (TBs) and afew full 
dislocations canbe foundin 3and20 nm nickel samples. A high density of full, 
dislocationsis observed in the 200 nmnickel grains. The insetto dshowsa 
high-resolution image of afull dislocation. 


sub-10-nm-sizedmetals. In thissynchrotron-based study, deformation 
behaviour and yield strength are obtained from the lattice changes 
ofa large quantity of nanograins and exhibit reproducible trends in 
strength and grain size. Additionally, extrinsic factors like impurities 
and amorphous regions that may be introduced during conventional 
sample preparation could strongly affect the mechanical behaviour of 
nanograined metals. Inour method, the strength in purenickel grains 
is determined by the internal piezometer of crystalline lattice strain, 
which mitigates the effects of extrinsic factors. 

Experimental workalso indicates that partial-dislocation-associated 
mechanisms improve the thermal stability of grain boundaries of 
nanograins”. Ifthe grain boundaries of nanograined metalsare sintered 
without grain coarsening, for example, through severe plastic defor- 
mation or explosive shock annealing”, large pieces of nanograined 
metalswith ultrahigh strength could potentially be fabricated for mass 
applications. Insummary, achievingan ultrahigh strength in pure nickel 
through grain refinement and suppression of GB plasticity provides 
anew strategy for designing ultrastrong, ultrahard metals for future 
applications. 
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Extended Data Fig. 1|The experimentalsetup of radial DACXRD. Kaptonisa polyimide film, hklrepresents the lattice planes; Sisthe azimuthal angle; and 6 
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Extended Data Fig. 2 | Grainsize distribution of nickel samples. a~d, Grain size distributions in3-nm, 8-nm,12-nmand 20-nmnickel;e-h, Grainsize distribution 
of 40-nm, 70-nm, 100-nmand 200-nmnickel. The particle sizes of the nickel samples were re-checked withXRD characterization, 
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Extended Data Fig.3| Raw powder samples. a-d, TEMimages of raw powder samples of 3nm a), 81m (b),121nm e) and 20 nm (4) nickel powder before 
compression. e-h, Scanningelectron microscopy characterization of 40 nm (e), 70 nm (f),100 nm (g)and 200 nm (h) nickel powder beforecompression. 
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Extended Data Fig. 4 | Plot of differential stress versus hydrostaticlattice 
strainin thenickel of variousgrainsizes. The circles, squares and triangles 
represent (220), (200) and (111) lattice planes, respectively. Strong strength 
anisotropyis exhibited for differentlattice planes, especially at smaller grain 
sizes, The lattice strainis calculated fromthe relative change in the unit cell 
parameter ata given appliedstressto the unit cell parameter underambient 
pressure (see Supplementary Information). The error bars for differential 
stressis calculated based onthe error of deviatoric strain Q(/xk!) and equations 
(6)t0(9). Note that for some of the data points the error bars 


(see Supplementary Information for definition) are smaller thanthe sizes of 
symbols, 
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The ability to grow properly sized and good quality crystalsis one of the cornerstones 
of single-crystal diffraction, isadvantageous in many industrial-scale chemical 
processes! °, andisimportant for obtaining institutional approvals of new drugs for 
which high-quality crystallographic data are required*”. Typically, single crystals 
suitable for such processes and analyses are grown for hours to days during which any 
mechanical disturbances—believed to be detrimental to the process—are carefully 
avoided. In particular, stirring and shear flowsare known to cause secondary 
nucleation, which decreases the final size of the crystals (though shear can also 
increase their quantity* “*). Here we demonstrate that in the presence of polymers. 
(preferably, polyionic liquids), crystals of various types grow incommon solvents, at 
constant temperature, much bigger and much faster whenstirred, rather than kept 
still. This conclusionis based on the study of approximately 20 diverse organic 
molecules, inorganic salts, metal-organic complexes, and even some proteins. 
Ontypical timescales of a few to tens of minutes, these molecules grow into regularly 
faceted crystals that are always larger (with longest linear dimension about 16 times 
larger) than those obtained in control experiments of the same duration but without 
stirring or without polymers. We attribute this enhancement to two synergistic 
effects. First, under shear, the polymersand their aggregates disentangle, compete 
forsolvent moleculesand thus effectively ‘salt out’ (thatis, induce precipitation by 
decreasing solubility of) the crystallizing species. Second, the local shear rate is 
dependent on particle size, ultimately promoting the growth of larger crystals (but 
not via surface-energy effectsasin classical Ostwald ripening). This closed-system, 
constant-temperature crystallization driven by shear could bea valuable addition to 
the repertoire of crystal growth techniques, enabling accelerated growth of crystals 
required by the materials and pharmaceutical industries. 


Although the phenomena we describe are observed even in solutions 
stirred by an ordinary magnetic stir bar (Supplementary Video), most 
experiments were performed ina standardized Couette cell, withagap 
of d=1mmand the inner cylinder (r,= 4 mmin radius) rotating at a 
constantangular velocity, usually @=400 rpm, corresponding to shear 
rate =167 s" (Fig. 1) but down to 60 rpmin some control experiments 
(see Fig. 4a). Inall experiments, the Reynolds number (for the inner 
cylinder) Re= rad /v (where viskinematic viscosity) was smaller than 
2, ensuring simple Couette flow rather than the more complicated flow 
regimes expected for Re >100 (ref. "). The typical procedure described 
here is for simple trimesic acid (TA), butis similar to that for other 
systems discussed later (see Fig. 3 and Supplementary Information 
section2.2). 

Inbrief, we start by mixingan undersaturated solution of acrystalliz- 
ingsubstanceinasolvent (47 mgof TA per 0.4mlofdimethylformamide 


(DMEF)) with 0.35 ml of the same solvent containing a polyionic liq- 
uid polymer (300 mg of poly(3-cyanomethyl-1-vinylimidazolium 
bis(trifluoromethanesulfonyl)imide); henceforth PIL, molecular 
weight 4.02%10°g mol", Fig.2; Supplementary Video 2). Theconcentra- 
tion of TAin the 0.75 mlofsolution thus prepared exceeds the saturation 
level by 15.5 mg, even though approximately twice as much TA can be 
dissolvedin pure DMF (92.6 mgper 0.75 ml), which indicates that PILA 
and TA are competing for shared DMF. The TA/PIL-1/DMF mixture is 
poured into the Couette cell and, when the inner cylinder begins to 
rotate, is subject to uniform shear flow (Supplementary Information 
section4.4),First, needle-shaped crystals become visibleto the naked 
eyeafter about 30 s of rotation, move with the fluid, and gradually grow 
toabout 440 pmafter10 min and about 740 umafter one hour (Fig. 1¢ 
and blueline in Fig. 1d). Spectroscopic signatures (from powder X-ray 
diffraction and'H nuclear magnetic resonance (NMR) of washed and 
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Fig.1|Shear-enhanced growth of TA crystalsin the presence of anionic 
polymer. a, Illustration of experimental set-up. b,¢, OpticalimagesofTA 
crystals grown within 10 min from the same TA/PIL-1/DME solution without (b) 
and with (¢) applied shear. We note that thescale barsare different. Image bis, 
taken with bright field llumination;image cistaken with crossed polarizers. 

4d, Average sizes of TA crystals increasing with time under shear in the presence 
ofPIL-I (blue; statistics based on n=49-S8crystalsanalysed for each time 
point), withoutshear but in the presence of PIL-1 (black; n=35-43), and with 
shear in pure DMF (green;n=236-391). Error barsindicate standard deviations 
ofsizes, not the errors of the mean. The green line corresponds to control 
experiments without PIL‘ butunder shear, inwhich 15.5 mg oversaturation of 
TAin DME wasachieved by adding 108.1 mgof TA powder to 0.75 mlof pure 
DMF before application of shear (we verified separately that the saturation 
level of TAin pure DMFis 92.6 mg per 0.75mi). The outcome was not sensitive 
tothe timeit took most ofthe TA powder to dissolve: the average size ofthe 
crystals obtained was the same inanother experiment whereby 85 mgofTA was 
first completely dissolved in 0.75 ml of pure DME at 25 °C, and anadditional 
23mg of TA powder wasadded tothe solution immediately before the startoF 
cell rotation (open green square). Finally, the open yellow circle correspondsto 


redissolved crystals in Fig. le, f; see also Fourier-transform infrared 
spectroscopy in Supplementary Fig. 13 and single-crystal X-ray dif- 
fractionin Supplementary Information section 2),are free of any PIL-1, 
‘match the spectra of TA-2DMF crystals reportedin the literature, and are 
of crystallographic quality (in terms of single-crystal X-ray diffraction) 
as good as those of TAcrystals grown by conventional recrystallization 
or solvent evaporation (Supplementary Fig. 12c). In sharp contrast, if 
the same solution or protocols used but no rotation is applied, the 
ill-shaped (Fig. 1b) crystals are only about 2 um long after 10 min of 
growth (black line in Fig.1d). When PIL-1is absent but shears applied, 
the needle-like crystals are about 44 1m longat 10 min (green line in 
Fig. 1d). None of these and also none of some other control experi- 
ments summarized in Fig. 1d (see also Fig. 4 for growth from powders 
using different shear rates, monomers or polymers of differentlength) 
yield crystals of sizes comparableto those grown under shear and with 
PIL-l present. 

Importantly, similar growth enhancement is observed for other, 
structurally diverse substances. This claim is supported by the size 
comparisons (shear + PIL versus no-shear + PIL, same growth times, 
allatroom temperature) in Fig. 3aas wellas the correspondingimages 
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an experimentin which 108.1 mgof TA (but no PIL) was firstcompletely 
dissolved in 0.75 mlof DMF ata slightly elevated temperature of 36°C andthen 
subjected to shear at25 °C for 10 min. In other words, oversaturationin this, 
experiment was achieved by cooling. Inall experiments with non-zero shear, 
the meanshear rate was )=1675".e,"H NMRspectraofthe washed and 
redissolved TA single crystals grown in PIL1/DMF at = 167 s*shear flow (blue) 
and by evaporation of DMF from the TA/DMF solution without any PIL (black). 
The chemical structurein the inset is TA. f, Powder X-ray diffractionspectraof 
TAcrystals grownin PIL-1/DMF aty=1675" shear flow (blue) compared to 
literature data" for pure TA crystals grown in DMF (black).g,h, Scheme of the 
conjectured mechanism: the polymer (blue) and the crystallizing substance 
(green) compete for shared solvent (grey). When shear flow isabsent (g), the 
amount of solvent sufficient to solvate the entangled polymer is lower than 
that tosolvate the polymer disentangled by shear flow (h), The disentangling 
polymer’steals' this additional solvent from the molecules of the crystallizing 
substance, causing these molecules toattach to the nearby crystal (the purple 
arrow in both panels pointsto thesame particle). Orange arrows illustrate the 
vector field of fluid velocity inthe case ofa mean shear flow. For effects of shear 
onnucleation, seethe discussion inthe main text and Fig. 4. 


ofcrystals of various small molecules, inorganic salts, metal-organic 
complexes, and even some proteins (Fig. 3band Supplementary Figs. 
12-35). The average increase in the longest linear dimensions of the 
crystalsis about 16-fold, as highas 42-fold for Nal, 171-fold for TA, and 
neversmaller than2-fold, The phases and crystallinities of the crystals 
match those grown over much longer times via traditional solvent 
evaporation (crystallinities are annotated as percentages in Fig. 3a and 
plotted in Supplementary Fig. 12; powder X-ray diffraction spectra are 
shown in Supplementary Figs. 16-28 and 31-35). BET surface areas of 
porous functional materials are improved with respect to synthesis 
by conventional methods (without PIL) by 51% for both the porous 
organic cage 17 and the covalent organic framework20, and by 24% for 
the metal-organic framework 19. Although absolute values generally 
dependon the synthetic protocol and activation method, the increase 
we observe is systematic (with the same activation method for each 
pair;see Supplementary Fig. 41), and may reasonably be attributed to 
defects present in the samples grown under shear", 

Regarding the choice of polymers used in the growth experi- 
ments (Fig. 2a), uncharged ones—such as polymethyl metacrylate), 
PMMA, or polyvinylidene fluoride, PVDF—also give similar results 


‘ Q " 
TFSt TPE r i 
P oot 
PIL-1 PIL-2 7 
Pia 
PIs 
oo 
| 
PMMA 
rf 
Wes 
~ af 
WCAIALN “a 
c¢ 
LAW 
NS oantesr 


10? 


PVDF 
101 


& 200 400 600 800 No 
Crystal size (um) polymer 


; 
2 
= 
5 
Crystal size (um) 
r 
T 


PIL’ PMMA PVDF 


Fig.2| Various polymers used for shear-enhanced crystallization. a, Most 
reliable results were obtained using polyionic liquid polymersbearingeither 
positive (PIL-1-PIL-Sand PIL-7) or negative (PIL-6) charges. PIL-ispoly(3- 
cyanomethyl-Lvinylimidazolium bis(crifluoromethanesulfonyl)imide);PIL-2is 
poly(3-cyanomethyl--vinylimidazotium tetraphenylborate);PIL-3is poly 
(3-hexyl-vinylimidazolium iodide); PIL-4 is poly I-methyl-4-vinylpyridinium 
bis(trifluoromethanesulfonyl)imide); PIL-Sis poly(3-methyl1-(4-vinylbenzyl) 
imidazolium chloride); PIL-6is poly(tetrabutylammonium4-styrenesulfonate); 
and PIL-7is quaternary ammonium polyethyleneimine. Forsynthetic details 
and characterization, see Supplementary Information section 1. PMMA and 
PVDF were also used but were not suitable forall solutes, with some of which 
(forexample, TAand Nal) they gelated. (11 refersto Nal. b, Representativesize 
distributions of TA crystalsgrown from approximately 2:um TA powder in 
PMMA/DMF and PVDF/DME.¢, Box plots of crystal sizes for the systems inbas 
Wells or the PIL-1and no-polymer conditions. For all cases, the concentration 
of polymer was 75 mg per 0.75 ml, growth time3 h, and meanshear rate y=855* 
(thatis,twiceaslow asin Fig. 1;for effects of yon crystal size, seeFig.4). 
Molecular weights were 996 kg mol (PMMA), 275 kg mol" (PVDF),and 402 
kgmof" (PIL1). Ine, the elements of the box plots are:25% and 75% quartiles 
(edges of the boxes), median (midlines), and maximum/minimum values 
(whiskers). Numbers of crystals on which these statisticsare based are: noPIL, 
248; PIL-1, 235; PMMA, 284; PVDF, 242. 


(see Fig. 2b, c), although with some solutes (for example, TA and Nal) 
they gelate. Both negatively and positively charged PILs are more 
robust, confirming previous reports that ionic liquids are versatile 
solvents compatible with a wide range of solutes!* and suitable for 
growing crystals! (though never before in shear flow). Conveniently, 
with theselection of PILs shown in Fig. 2a, we have been able tomake our 
method compatible with solvents ranging from polar (DMF, dimethyl 


sulfoxide (DMSO), water and methanol) to less polar (dichlorometh- 
ane, DCM) by changing the polarity (or sign) of the pendant-chain 
charges or by varying counterions from small halogen anions to the 
large tetrabutylammonium (TBA) cation (see specific experimental 
procedures in Supplementary Information section 2.2). 

To better understand the mechanism of shear-enhanced crystal 
growth, we performed a series of experiments in which we systemati 
cally varied the shear rates and polymer chain length. In these experi 
ments, summarized in Fig. 4, weaimed to eliminate any effects of initial 
nucleation, whose dependence on shear may be convolutedand whose 
mechanisms are still unclear (see refs. "and references therein). 
Accordingly, we grew the crystals by ripening of TA powders (average 
particlesize 2 0.5 um) in PIL-1/DMF, in contrastto growing the crystals 
from uniform solutions (see above and Figs. 1, 3). The histogram in 
Fig. 4a provides evidence that, when other parameters are kept con- 
stant, the size of the crystals increases with increasing shear rate and 
already at y=167s" further growth becomes limited by the mm gap 
between the walls of the Couette cell. The distributions in Fig. 4b dem- 
onstrate thatata given shear rate, for the same concentration of PIL-1 
monomers and same excess (15.5 mg per 0.75 ml of DMF) of TA mass 
over saturation level (Fig. 4c), crystal sizes increase when the lengths 
of the PIL-1 polymer chainsincrease. In parallel, the rheological data 
in Fig. 4d demonstrate that the viscosities of the solutions also increase 
in the same order, while the solubility c, of TA plotted in Fig. 4c 
decreases. This is an important observation asit strongly indicates 
that the phenomena we describe cannot be rationalized simply by 
mixingaccelerating transport between crystallites and thus facilitating 
Ostwald ripening. Both diffusion of macromolecules and mixing slow 
down in viscous media, whereas the diffusion of small molecules in 
polymer solutionsis either unaffected by increased polymer length or 
slows down slightly”, The solubility c, of TA in PIL-1/DMF solutions 
is3.5 times lower than in pure DMF and 1.7 times lower than in mono- 
mer/DMF solution (Fig. 4c). Therefore, from the scaling arguments 
presented in Supplementary Information section 7, we would expect 
that classical, transport-limited Ostwald ripening of TA in PILA/DMF 
mixtures is at least (3.5)”” = 7 times slower than in pure DMF and 
(L7)"7 = 1.25 times slower than in monomer/DMF solution (opposite 
to the actual trend in Fig. 4b) and should slow down with (or, at least, 
not be affected by) the polymer's molecular weight, which is not the 
case. We also ruled out potential thermal effects from viscous heating 
(see Supplementary Information section 4.3). Finally, the curves in 
Fig. 4d, e capture the decrease of viscosity with increasing shear rate. 
This so-called ‘shear thinning’ can be ascribed to polymers or their 
aggregates disentangling or unfolding in elongational and shear flows; 
this phenomenon was predicted by de Gennes” and experimentally 
confirmed by analyses of bulk properties” **and even by directsingle- 
molecule imaging”. We note that the decrease in viscosity shown in 
Fig. 4d,e, owing to some changes of polymer solution microstructure, 
necessarily begins atshear ratesbelow10 + 100s", and thereforethese 
local microstructure changes must be activated by the shear 
(25 +1675") that we apply in crystal growth experiments (for a detailed 
discussion of rheology, see Supplementary Information section 6). 
Inthis context, we observe that prolonged shearing of the PIL-1/DMF 
solution reduces the number-averaged hydrodynamic radius r, 
observed by DLS (for example, from about 800 nm to 60 nm at 
y=167s" for 3h), although these experiments themselves do not pro- 
vide detailed insight into the microscopic changes in the polymer’s 
structure (disentanglement, unfolding, breaking of gel-like structures 
or destruction of the aggregates the PILs are known to form’ and 
which may break under shear flow”. 

All of these observations substantiatea plausible mechanisminvoly- 
ing (1) shear-dependent solvation changes and (2) differences in the 
local shear rates around larger (rather than smaller) crystals. Specifi- 
cally, as the polymer chains disentangle or their aggregates distort 
or break under shear, they become better exposed to the solvent and 
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Fig.3 | See next page or caption. 
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Fig.3|Shear-enhanced growth of additional19 differentcrystalsin the 
presence of polyionicliquids.a, Sizes of crystals note logarithmic vertical 
scale) obtained with shear (red) and without shear (blue) under otherwise 
identical conditions. The elements of the box plotsare: 25%and 75% quartiles 
(edges of the box), median (midline), and maximum/minimum values 
(whiskers). Numbers of crystalsanalysed for each substanceare given below 
(along with experimental conditions) and denoted as n,for with-stirring 
conditionsand., forno-stirring conditions, Crystal quality (percentages of 
crystalline phase evaluated by powder X-ray diffraction analysis) is indicated by 
‘numbers on bar plots, in red font for crystals grown with PIL under shearand in 
blue font forcrystals grown witha conventional method (see Supplementary 
Figs. 11, 12for details).b,¢, The specificsubstanceswe tested (b) and the 
crystals they typically grow under shear (scale bars=S0 im, withthe exception 
of 20, where the scale barisO.5 jm) (c).Allcrystals were grownin a Couette cell 
(Fig. 1a; see Supplementary Information section for cell design) with gap 

‘mm andata constant shear rate y=1675"(@=400rpm). VariousPILs (both 
positively and negatively charged) were used (chosen tobemisciblewith the 
solute or solvent) and their molar concentrationsare given interms of repeat 
units. Unless otherwise specified, the solvent was DMF. Cvaluesare the 
concentrations of the crystallizing solutes. Times are those of growth under 
shear and were chosen such that the formation of first crystals couldbe 
discerned, in many cases by the naked eye. Detailed experimental conditions, 
size distributions and additional images of crystals grown withand without 
shear can be foundin Supplementary Information section 2.1, TA, growing 
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Fig. 4| Effects of shear ratesand polymer’schain length on crystal growth. 
a, Distributions of sizes of the TA crystals grown for 3h fromthesameTA 
powder (nottobe confused with experiments on growth fromthe 
oversaturated solutions presented in Figs.1, 3)in PIL-1/DMF (PIL-1 molecular 
weight 402kg mol") under different mean shear rates. Eachhistogramisbased 
onthe analysis of (top to bottom panels) 1,000,235, 100 and 131crystals. 

b, Distributions ofsizes of TA crystals grown for 3 hfrom the same TA powder 
under)=85s"mean shear rate in pure DMF (top panel), in monomer of PIL-Lin 
DMF (300 mg per 0.75 ml),and in PIL-1/DMF (300 mg per 0.75 ml) with two 
different molecular weights (indicated). Each histogramis based on the 
analysis of, top to bottom panels, 315,348, 100.and 383 crystals. ¢, Equilibrium 
solubility (molar concentration at saturation) of TA nthe same liquidsasin 
panelb, measured with no shear.d,e, Measurements of viscosity atdifferent 
shear rates, There is strong ‘primary’ shear thinningat low shear rates (10"'s* 
to10s",d), anda weaker ‘secondary’ shear thinningat highershearrates 
(105*,e), bothare facilitated by addition of solvent (compare the blue curves 
‘measured for the same molecular weight of 402kg mol, butatdifferent 
concentrations, asindicated in the legend) or by increasing the molecular 
Weight (compare orange curve of PIL-1 with molecular weight 849 kgmol ‘at 
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time 10 min (C= 0.30 M, Cru 
2,5-dihydroxyterephthalic acid, 10 min (C=0.2M, Ca =0.97 M:m=47,1,=44). 
3, Anthracene-9-carboxylicacid, 20 min (C=0.45M, Cras=1.45M;1.=30, 
n,=37).4, -cyanomethyl-vinylimidazolium bromide, 5 min (C=0.47 M. 
Ge=1-5M, methanol solvent; n,=25,n,=25).S, Ethyl viologen dibromide, 10 
min (C=0.24M, Cou <= 1.17 M, methanol; = 43, 2=77).6,p-Nitroaniline, 10h 
.42.M; 1, =73, n:=33).7, Tetrathiafulvalene, 5 min (C=0.08 
M.Gona 1v/vDCM/methanol; n,=71, n,=90).8, 1,2,4,5-Tetrakis(4- 
carboxyphenyl)benzene, 2h(C=0.05M, Cr. a 
‘meso-tetra(carboxypheny!)porphyrin, 10 min (C=0.005M, Cy, 
1n,=35, n:=25).10, B-Cyclodextrin, 10 h (C=0.09 M, C.s=0.91M; m=26,11;=13). 
11, Sodiumiodide, 20min (C=0.47 M, Crus=1.SIM;n,=49,n;=57).12, 
Phosphotungsticacid, 3h (C=0.06M, Cyy.s=0.71M, water; n,=11,1,=22).13, 
Rhodium(1) tris(criphenylphosphine) chloride, 6 min (C=0.04M, Cm1=0.49 M, 
DCM;n,=35, n:=43).14, lron(ttt) meso-tetraphenylporphine chloride, 3min 
(C=0.02M, Cu .=1.17 M, DCM; 2, =43, n=29).15, Hen egg white lysozyme, 2h 
(C=0.002M, Cru <=1.83M, NaAc-HAc buffer; m.=21,2=31).16, Thaumatin, 

16h (C=0.005M, Cy, .=0.71M, ADA buffer; n,=99,n,=54).17, CC3-R porous 
molecular cage, 10 min (C=0.01M, Cyu=0.54M,DCM; 1,=57, =47).18, 
metal-organic polyhedron (MOP), 10 min (C=0.003M, C,y..=1.42M, methanol; 
1n,= 40, n,=14).19, HKUST-1 MOF grown from TA and Cu(NO,)3H,0, 10h 
(Cx=0.27M, Ceunorai207 0.33M, Cr. 32, y= 31).20, TAPB-BTCA 
covalent organic framework, 15 min (Ciao ).02M, Cyy.s=0.12M, 
DMSO;1,=112, n=112). 
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concentration 300 mg per 0.75 mI DMF (sameasinthe crystal growth 
experiments) todark blue curve for molecular weight 402 kgmol at 300 mg 
per 0.75 ml DMF (2mM). Asmore solvent is added, the primary’ shear thinning 
(d) progresses further into higher shear ratesand reaches lower viscosities, 
indicating that disentanglementislimitedby available solvent. Inthe high 
shear region (e),'secondary’thinningalso becomes more prominent when 
moresolventisadded or when molecular weightis increased, and whenevera 
clear plateau could be discerned, onset of thinning (shown by black dots, which 
areintersections of dashed asymptotes) shiftsto lower shears with more 
dilution (505" for2mM versus 115 s"forl mM). Fora more dilute, mMsample, 
the‘secondary’ thinning (e) begins atan even lower shear rate and merges into 
the first, low-shearthinning process, thusshowingnoclear plateau of viscosity. 
Curves from the yellow rectangle indare plotted in eonasemi-logarithmic 
scale with viscosity for eachcurvenormalized by its value qoat 9%. Viscosity 
of pure DMF (grey dashed line) is taken fromliterature”. The curve for the 
monomer (green) ismeasured at higher concentration (2g/0.75ml DMF) than 
for PILs, since more dilute monomer solution has even lower viscosity and 
could not bereliably measured by our rheometer. 
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Fig.5 | Effects of particle size on ocal shear rates. a, Maximum values of 
shear rate nearan object (a rod asin panelb) in Couette flow (d=1mm, mean 
)'=1675 "indicated by the horizontal dotted greenline) as afunction of the 
object’ssizeand corner sharpness (r.isthe radius of corner curvature), Dashed 
curves were calculated for anon-rotating rod (with two of its faceskept parallel 
tothe cell walls).Solid curves were calculated for arod thatisbeing freely 
rotatedby theflow. b, Theoretical map of shear ratenearalong, freely rotating, 
rod in horizontal Couette flow (distance between walls d=1mm, mean 
)=1675"). The rod’saxis points into the page; the rod’s cross-sectionisasquare 
(20 um per edge) withrounded corners (r.=2 1m). Being freely rotatedby the 
shear flow, therodmaintainsa constant angular velocity (clockwise, as 
indicated by thecircular grey arrow). Black curvesare streamlines. Black cones 
indicate velocity direction and magnitude, This map depicts one instantof 
time;asthe object rotates, maximallocal shear rate oscillates in phase with this 
rotation, but regions of high shear rate remain mostly localizednear the sharp 
cornersatall times. The liquid isassumed to be newtonian, although very 
similar results were obtained when weused realistic dependence of viscosity 
on shear rate (from Fig. 44, solid blue curve). Details ofall these calculations are 
described in Supplementary Information section 4 (see also Supplementary 


their effective volume of solvation layer” increases. This phenom- 
enonisin linewith previous experiments on both polymers and ionic 
liquids”, whereby addingmore solventenhances shear thinning, hint- 
ing that shear may cause greater demand for solvent by the polymer 
or ionic liquid; we found that this trend also holds for PIL-1/DMF (see 
blue curvesin Fig. 4d). What this means for our experiments is that the 
disentangled polymers can effectively compete for solvent with the 
solute which, upon losing the solvent, starts to crystallize (this effect 
is loosely analogous to ‘salting out, which causes the precipitation 
andcrystallization” of various solutes, including biomolecules”). 
We observe that this mechanism explains why some other polymers 
are less robust or less general than PILs (such as PMMA and PVDF; 
see Fig. 2b, c), because they gelate or precipitate before the solute can 
develop large crystals. By contrast, PILs do not precipitate easily and 
are known to be versatile solvents, not unlike molten salts”. 
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The second part of the proposed mechanism rationalizes the 
preferential growth of larger (rather than smaller) crystals which, as 
we mentioned above in the context of crystal sizes increasing with 
viscosity and decreasing with solubility, cannot be ascribed to clas- 
sical Ostwald ripening. Instead, the explanation might lie in the fact 
that in a mean shear flow, the local shear near a particle with sharp 
edges increases with particle size”. For our system, such an increase 
is supported by the results of computational fluid dynamics simula- 
tions of liquid flows around rod-shaped particles with square cross- 
sections ranging from2 umto1mmand moving freely in a Couette cell 
(Fig. Sb, c). Therefore, the disentanglement of PILs and ‘competition’ 
for solvent are expected to be more pronounced near larger particles, 
which therefore grow preferentially; conversely, the smaller particles 
remain more soluble. Experimentally, we found thatamean shear of 
y=167s ‘indeed decreases global solubility of TA in PIL-1/DMF (molec- 
ular weight 8.49 x 10° g mol *) from 167.2 £2.8 mM to 161.6 + 0.8mM 
(see Supplementary Table 1 in Supplementary Information section 
2.1). We note that this 1%~5% modulation of solubility is an order of 
magnitude larger than the typical values sufficient to drive Ostwald 
ripening (for example, <0.1% for 10-um crystals of TA, assuming a 
surface energy of 100 mj m” in the Ostwald-Freundlich/Kelvin 
equation). 

However, if it were just simple disentanglement or single-chain 
unfolding, itwould have been reversible, and therefore some portion 
of crystals suchas TA would re-dissolve shortly after stopping the rota- 
tion of the Couette cell: PIL-1 microstructure would have returned to 
its original state, releasing the captured solvent. In experiments, TA 
crystals did not re-dissolve even 15 h after the rotation had ceased, 
indicating that relaxation of PILs tothe initial stateisinhibited-perhaps 
owing to agel-like macromolecular crowdingin the concentrated solu- 
tions weuse (this would bein line with onset of shear thinning already at 
very low shear rates <10s'shownin Fig. 4d, as wellas linear response 
domainatvery low strains (around 0.03) shown in the dynamic relaxa- 
tion datain Supplementary Fig. 50a). 

Finally, if the mechanism we propose for shear-enhanced ripen- 
ing of powders (Fig. 2b, cand Fig. 4) is correct, it should also apply to 
the growth starting from oversaturated solutions (Fig. Land Fig. 3)— 
although, in the nucleation phase, it would strongly suppress smaller 
nuclei in favour of the larger ones, and we would expect large crystals 
to grow from such solutions faster than from powders. This isindeed 
the case and solution growth is approximately an order of magnitude 
faster. For instance, TA crystals reach a size of 440 um on average 
after 10 min of growth from solution (Fig. 1d) versus an average size 
0f336 pm after only 3h of growth from powder (Fig. 4a, bottom panel). 
Furthermore, the impact of this mechanism during nucleation phase 
must far outweigh the previously reported effects of shear flow on 
nucleation”, since those effects predict the negative effect of 
shear on the final crystal size—thatis, the opposite of whatwe observe 
(Fig. 4a). 

Insummary, we showed that good-quality crystals of various kinds 
can grow larger and more rapidly when subject to shearin the presence 
of polymers. Since this trend is observed for crystals and polymers 
of various types, it can reasonably be explained by physical effects 
rather than the nuances of specific polymer-solute chemical inter- 
actions. At the same time, such interactions might have more subtle 
effects—we have seen, forinstance, that linear PIL-1and branched PIL7 
can give crystals of thesame space group but different habits (see Sup- 
plementary Fig. 37). Such effects and also more detailed theoretical 
models certainly merit further study of thisinterestingnon-equilibrium 
system. From a practical point of view, we anticipate that our techni 
cally straightforward, constant-temperature method will be useful as 
ameans of accelerating crystal growth, especially for substances that 
must be kept within anarrow temperature range (for example, proteins) 
or cannot be recrystallized (such as metal-organic frameworks and 
covalent organic frameworks). 
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Structurally intact tropical forests sequestered about half of the global terrestrial 
carbon uptake over the 1990s and early 2000s, removing about 15 per cent of 
anthropogenic carbon dioxide emissions'®. Climate-driven vegetation models 
typically predict that this tropical forest ‘carbon sink’ will continue for decades** 
Here weassess trends in the carbon sink using 244 structurally intact African tropical 
forests spanning 11 countries, compare them with 321 published plots from Amazonia 
and investigate the underlying drivers of the trends. The carbon sink in live 
aboveground biomass in intact African tropical forests has been stable for the three 
decades to 2015, at 0.66 tonnes of carbon per hectare per year (95 per cent confidence 
interval 0.53-0.79), in contrast to the long-term decline in Amazonian forests*. 
Therefore the carbon sink responses of Earth’s two largest expanses of tropical forest 
have diverged. The difference is largely driven by carbon losses from tree mortality, 
with no detectable multi-decadal trend in Africa and along-term increase in 
Amazonia. Both continents show increasing tree growth, consistent with the expected 
net effect of rising atmospheric carbon dioxide and air temperature”*, Despite the 
past stability of the African carbon sink, our most intensively monitored plots suggest 


apost-2010 increase in carbon losses, delayed compared to Amazonia, indicating 
asynchronous carbon sink saturation on the two continents. A statistical model 
including carbon dioxide, temperature, droughtand forest dynamics accounts for 
the observed trends and indicates a long-term future decline in the African sink, 
whereas the Amazonian sink continues to weaken rapidly. Overall, the uptake of 
carbon into Earth's intact tropical forests peaked in the 1990s. Given that the global 
terrestrial carbon sinkis increasingin size, independent observations indicating 
greater recent carbon uptake into the Northern Hemisphere landmass” reinforce our 
conclusion that the intact tropical forest carbon sink has already peaked. This 
saturation and ongoing decline of the tropical forest carbon sink has consequences 
for policies intended to stabilize Earth’s climate. 


Tropical forests account for approximately one-third of Earth's ter- 
restrial gross primary productivity and one-half of Earth’s carbon 
stored in terrestrial vegetation". Thus, small biome-wide changes in 
tree growthand mortality can have global impacts, either buffering or 
exacerbating the increase in atmospheric CO,, Models”, ground- 
based observations", airborne atmospheric CO, measurements*, 
inferences from remotely sensed data” and synthetic approaches**"* 
each suggest that, after accounting for land-use change, the remain- 
ing structurally intact tropical forests (thatis, those not affected by 
directanthropogenicimpacts such as logging) are increasingincarbon 
stocks. This structurally intact tropical forest carbon sinkisestimated 
at approximately 1.2 Pg C yr over 1990-2007 using scaled inventory 
plot measurements‘. Yet, despite its relevance to policy, changes in 
this key carbon sink remain highly uncertain’, 

Globally, the terrestrial carbon sinkisincreasing””*”. Between 1990 
and2017 the land surface sequestered about 30% of ll anthropogenic 
carbon dioxide emissions’. Rising CO, concentrations are thought 
to have boosted photosynthesis more than rising air temperatures 
haveenhanced respiration, resulting inan increasing global terrestrial 
carbon sink**"*” Yet, for Amazonia, recent results from repeated 
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censuses of intact forestinventory plotsshow a progressive two-decade 
decline in sink strength primarily due to an increase in carbon losses 
from tree mortality’. Itisunclear if this simply reflects region-specific 
drought impacts”, or potentially chronic pan-tropical impacts of 
either heat-related tree mortality”, or results from internal forest 
dynamics as pastincreases in carbon gains leave the system”, A more 
recent deceleration of the rate of increase in carbon gains from tree 
growth is also contributing to the declining Amazon sink®. Again, it 
isnot known whether this is a result of either pan-tropical saturation 
of CO> fertilization, or rising air temperatures, or is simply a regional 
drought impact. To address these uncertainties, we (1) analyse an 
unprecedented long-term inventory dataset from Africa, (2) pool the 
new African and existing Amazonian records® to investigate the puta- 
tive environmental drivers of changes in the tropical forest carbon 
sink, and (3) projectits likely future evolution. 

We collected, compiled and analysed data from structurally intact 
old-growth forests from the African Tropical Rainforest Observation 
Network” (217 plots) and other sources (27 plots) spanning the period 
Lanuary 1968 to 31 December 2014 (Extended Data Fig. 1; Supplemen- 
taryTable1).In each plot (mean size, 1.1ha), all trees>100 mm in stem 
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Fig.1/Long-term carbon dynamics of structurally intact old- 

growth tropical forests in Africa and Amazonia. a-c, Trendsin net 
aboveground live biomass carbon (a), carbon gainsto the system from wood 
production (b),and carbon losses from the system from tree mortality (e), 
‘measured in 244 African inventory plots (blue lines) and contrasting 
published“ Amazonian inventory data (brown lines; 321 plots). For Africawe 


diameter were identified, mapped and measured at least twice using 
standardized methods (135,625 trees monitored). Live biomass carbon 
stocks were estimated for each census date, with carbon gains and 
losses calculated for each interval (Extended Data Fig. 2). 


Continental carbon sink trends 
We detect no long-term trend in the per unit area African tropical 
forest carbon sink over three decades to 2015 (P= 0.167; Fig. 1). The 
aboveground live biomass sink averaged 0.66 tonnes of carbon per 
hectare per year (0.66 Mg C ha" yr with 95% confidence interval (Cl) 
of 0.53-0.79 and n= 244) and was significantly greater than zero for 
every year since 1990 (Fig.1; P<0.001 for each time period in Table 1). 
Although very similar to past reports (0.63 Mg C ha yr"), this first 
estimate of the temporal trend in Africa contrasts with the significantly 
declining (P= 0.038) Amazonian trend’ (Fig.1).A linear mixed effects 
model showsa significant difference in the slopes of the sink trends for 
the two continents over the common time window (pooled data from 
both continents, common time window, I January 1983 to mid-201 
P=0.017). Therefore, the per unit area sink strength of the two largest 
expanses of tropical forest on Earth diverged in the 1990sand 2000s. 
The proximal cause of the divergent sink patterns is a significant 
increase (P= 0.002) in carbon losses (from tree mortality, thatis, the 
loss of carbon from the live biomass pool) in Amazonian forests, with 
no detectable trend over three decades in African forests (P= 0.403; 
Fig. 1; Table 1). A linear mixed effects model using pooled data shows 
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show complete years with atleast 25 plotsmonitored; for Amazonia we show 
the published record®, Shading corresponds to the 95% CI, with darker shading 
indicatinga greater number of plotsmonitored in that year(the lightest 
shadingindicates the minimum 25 plots monitored). The Clfor the Amazonian 
datasetis omitted for clarity, butcan be seenin Fig.3.Slopesand 

Pvaluesare from linear mixed effects models (see Methods). 


a significant difference in slopes of carbon losses between the two 
continents over the common time window (P= 0.027; 1 January 1983 
tomid-2011). Long-term trends in carbon gains (fromtree growthand 
newly recruited trees) show significant increases on both continents 
(P=0.037 for Africa; P< 0.001 for Amazon; Fig.1),andwe could detect 
no differencein slopes between the continents (P= 0.348; carbon gains 
from treegrowthalonealso showno continental differencein long-term 
trends, P= 0.322). However, anassessment of how underlying environ- 
mental drivers affectcarbon gains and lossesisneeded to understand 
the ultimate causes of the divergent sink patterns. 


Understanding the carbonsink trends 

Wefirstinvestigate those environmental drivers exhibitinglong-term 
change that affect photosynthesis and respiration in theory-driven 
models: atmospheric CO; concentration, surface air temperatureand 
water availability. Bivariate models (Fig. 2) and a linear mixed effects 
model of carbon gains (Extended Data Table 1), with censuses nested 
within plots, and pooling the new African and published Amazonian 
data, show a significant positive relationship with CO, (P= 0.021in 
Fig. 2; P= 0.001 in Extended Data Table 1), and significant negative 
relationships with mean annual temperature (MAT; P < 0.001 in 
Fig. 2 and Extended Data Table 1) and drought (P= 0.003 in Fig. 
P<0.001 in Extended Data Table 1), with drought measured as the 
maximum climatological water deficit (MCWD)". These results are 
consistent witha positive CO, fertilization effect, and negative effects 
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Table 1 | Carbon sink in structurally intact old-growth tropical forests in Africa, Amazonia and the pan-tropics, 1980-2040 


Period Number ofplots Per unit area aboveground live biomass C sink (Mg Cha" yr") TotalC sink (Pg C yr 

Africa Amazon Africa ‘Amazon Pan-tropics® Africa’ ‘Amazon Pan-tropios® 
y980-1990 45 73 0.33 (0.06-0.63) 0.35 (0.06-059) —0.35(0.07-0.62) 0.28 (0.05-053) 0.49(0.08-0.82) 0.87 (016-152) 
4990-2000 96 72 0.67(0.43-0.89) 053(0.42-0.65) __ 0.57(0.39-074) _0.50(0.32-066) 0.68 (054-083) 1.26 (0.88-163) 
2000-2010 194 201 0.70(0.55-084) 038 (0.26-0.48)  0.50(0.35-064) 0.46(0.37-0.56)_0.45(0.31-0.57) _ 0.99(0.70-1.25) 
2010-2015" 184172 0.66 (0.40-0.51) 0.24(0.00-0.47) _ 0.40(0.15-0.65) —0.40(0.24-0.56) 0.27(0.00-052) 0.73 (0.25-118) 
2010-2020" - = (0.63 (0.36-0.89)  0.23(-0.05-0.50) _0.38(0.11-0.65) _ 0.37(0.21-0.53) _0.28(-0.05-0.54) _ 0.68(0.17-116) 
2020-2030" — = (0.59 (0.24-0.93) 02(-0.29-0.51) __ 0.30(-0.08-0.67) 0.31(0.13-0.49)  0.42(-0.29-052) _ 0.A7(-0.15-107) 
2030-20407 — 7 0.55 (0.08-0.99) — 0.00(-0.54-0.49) 0.21(-0.29-0.67)_0.26(0.04-0.47)0.00(-0.50-0.46) 0.29(-0.46-0.97) 


This table covers 1 January 1980 to $1 December 2014 and predictions to 31 December 2039. Mean values are in boldface, future predictions initalics, uncertainties in parentheses: 95% boot 


strapped confidence intervals for 1980-2018, and 20 forthe predictions (2010-2040) 


“The total continental C sink isthe per unit area aboveground C sink multiplied by intact forest area (from ret.’ see Extended Data Table 2) and includes continent-specific estimates of three 
ccarbon-stock components that were not measured inthe inventory plots: trees with a diameter at breast height of <100 mm, lianas and roots (see Methods). 

“The per unit area pan-tropical aboveground live biomass C sink isthe area-weighted mean of African, Amazonian and Southeast Asian sink values. Southeast Asian values were from published 
per unit area carbon sink data (n= 49 plots) for 1990-2015, with 1980-1990 assumed to be the same as 1990-2000 owing to very low sample sizes. The pan-tropical otal C sinks the sum of 
Arican, Amazonian and Southeast Asian total continental carbon sink values. The continental sinkin Southeast Asia is 2 modest and declining contribution tothe pan-tropical sink, owing tothe 
very small area of intact forest remaining, at 0.11 Pg Cyr", 0.08 Pg Cyr", 0.07 Pg Cyr and 0.06 Pg Cyr" in the 1980s, 1880s, 2000s and 2010s, respectively: hence uncertainty in the Southeast 


{Asian sink cannot reverse the pan-tropical declining sink trend. 


“The Amazonian sink nthe 2010-2015 time window was calculated from 172 plots that were mostly measured between 1 January 2010 and mid-2011. The Lack of temporal coverage laterin this, 
period probably has litte impact on the esults; adding modelled results for | January 2012 to 31 December 2014 gives a per unit area aboveground sink of 0.25 Mg C ha’ yr (0,00-0.49), which 


would increase the pan-tropical total C sink by 0.01 Pg C yr’ 


“Per unit area total C sink for 2010-2020, 2020-2030 and 2030-2040 was predicted using parameters from Table 2, except for the 2010-2020 sinkn Africa, which is the mean of the measured 
sink rom 2010-2018 and the modelled sink from 2015-2020. For the Asian sink we assumed the same parameters as for Africa, because Asian forest median CRT is 61 years, close to the African 


median of 63 years, 


of higher temperatures and drought on tree growth, consistent with 
temperature-dependent increases in autotrophic respiration, and tem- 
perature-and drought-dependent reductionsin carbon assimilation. By 
contrast, the equivalent models for carbon losses show no significant 
relationships with CO, (P= 0.363 in Fig. 2; P= 0.344 in Extended Data 
Table 1), MAT (P=0.789 in Fig. 2; P= 0.804 in Extended Data Table) or 
MCWD (P= 0.338 in Fig. 2; P= 0.325 in Extended Data Table 1). 

We further investigatethe responses of carbon gains and losses (for 
which the above analysis has no explanatory power) by expanding 
our potential explanatory variables to include five more. These are 
the changesin environmental conditions (CO,-change, MAT-change, 
MCWD-change, see Extended Data Fig. 3 or calculation details) and two 
attributes of forests that may influence their response to the same envi- 
ronmental changes: the plot mean wood density (whichin old-growth 
forests correlates with belowground resource availability") and the 
plot carbon residencetime (CRT, whichmeasures how long fixed carbon 
remainsinthe systemand hence reflects when past increasesin carbon 
gains leave the systems elevated carbon losses”). 

The minimum adequate carbon gain model using our expanded 
explanatory variables (best-ranked model usingmultimodel inference) 
hasa significant positive relationship with CO,-change (P= 0.013), and 
significant negative relationships with MAT(P= 0.001), MAT-change 
(P<0.001), MCWD (P< 0.001) and wood density (P= 0.015; Table 2; 
model-average results are similar, see Methods and Supplementary 
Tables 2-4). The retention ofboth MAT and MAT-change suggests that 
higher temperatures correspond to lower tree growth, and that trees 
only partially acclimate to recently rising temperatures, which further 
reduces growth, consistent with warming experiments” and observa- 
tions’. The inclusion of higher wood density and its relationship to 
lower carbon gains (Extended Data Fig. 4), alongside no significant 
temporal trendsin wood density (Extended Data Fig. 5), suggests that 
old-growth forests with denser-wooded tree communities typically 
have fewer available below ground resources, or such patterns may 
also emerge from disturbance regimes lacking large-scale exogenous 
events, consistent with previous studies", 

The minimum adequate carbon gain model using our expanded 
explanatory variables also highlights continental differences. Between 
January 2000 and 31 December 2014 modelled African forest carbon 
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gains increased by 3.1% compared with a0.1% decline in Amazonia over 
the same interval (Table 2). In Africa, from 2000 to 2015, the increase 
in carbon gains was composed of a 3.7% increase from CO;-change, 
partially offsetby increasing droughts depletinggains by 0.5%, andonly 
aslight decline in gains of 0.1% resulting from temperature increases 
(Table 2), because the rate of temperature change (MAT-change) decel- 
erated over this time window (Extended Data Fig. 5). For Amazonia, 
the same 3.7% increase in carbon gains due to CO,-change was seen. 
Opposing this trend was increasing droughts—and the greater sensitiv- 
ity to drought of Amazonian forests—which reduced carbon gains by 
2.7% (five times theimpactin Africa), and temperature increases atthe 
same rate as in the past (that is, MAT-change is zero) further reduced 
gains by 1.1% (ten times the impact in Africa), leaving a net change in 
gains slightly below zero (Table 2). Therefore, the stalling of carbon 
gain increases in Amazoniain the decade to mid-2011°isa response to 
drought and temperature and not due to an unexpected saturation 
of CO, fertilization. 

Overall, the larger modelled increase in carbon gains in Africa rela- 
tive to Amazonia appear to be driven by slower warming, fewer or less 
extreme droughts, lower forest sensitivity to droughts, and overall 
lower temperatures (African forests are on average -1.1°Ccooler than 
Amazonian forests, because they typically growat higher elevations of 
-200metres abovesealevel). Other continental differences may also be 
influencing the results, including higher nitrogen deposition in African 
tropical forests due to the seasonal burning of nearby savannahs” and 
biogeographical history resulting in differing contemporary species 
pools and resulting functional attributes", 

The minimum adequate carbon loss model using our expanded 
explanatory variables shows significantly higher losses with CO,-change 
(P=0.026) and MAT-change (P< 0.001) and significantly lower losses 
with MCWD (P= 0.030) and CRT (P< 0.001; Table 2). Thus, changes in 
carbon losses appear to be largely afunction of past carbon gains. First, 
the greater losses in forests withshorter CRT conform toa ‘high-gain, 
high-loss’ forest dynamics pattern’*. Second, wetter plots havea 
longer growing season and thus they have higher gainsand correspond- 
ingly higher losses, explaining the negative relationship with MCWD. 
Third, as increasing CO, levels result in additional carbon gains, after 
some time these additional past gains leave the system, resulting in 
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Fig.2|Potential environmental driversof carbon gainsandlossesin 
structurally intact old-growth tropicalforestsin Africaand Amazonia. 
Aboveground carbon gains, from woody production (a-c),and aboveground 
carbon losses, fromtree mortality (d-f),are presented astime-weighted mean 
values for each plot, thatis, each census within aplot is weighted byits length, 
against the corresponding values of atmospheric carbon dioxide 
concentration (CO,), temperature(MAT) and drought (MCWD), for African 
(blue) and Amazonian (brown) inventory plots. For visual clarity each data 
point therefore represents an inventory plot, and the shading represents the 
total monitoringlength, with empty circles correspondingto plots monitored 
for <5 years and solid circles for plots monitored for>20 years. Solidlinesshow 


greater carbon losses, which explains the positive relationship with 
CO,-change. Finally, in addition to these relationships with carbon 
gains, the inclusion of MAT-change (P< 0.001) indicates tree mortality 
induced by heat or by increased vapour pressure deficit™. Overall, our 
results imply that chronic long-term environmental change factors, 
temperatureandCO,, rather than simply the directeffects of drought, 
underlie longer-term trendsin tropical forest tree mortality, although 
other changes suchas rising liana infestation rates seen in Amazonia” 
cannot be excluded. 

The minimum adequate carbon loss model using our expanded 
explanatory variables replicates the continental trends (Fig. 3). The 
overall modelled lower loss rates in Africa reflect their longer CRT (69 
years, 95% Cl, 66-72), compared with Amazonian forests (56 years, 95% 
C1, 54-59) while over the 2000-2015 window the much smaller mod- 
elled increasein loss ratesin Africa compared to Amazoniaresults from 
aslower increasein warmingandastable CRT in Africa comparedtocon- 
tinued warming at previous rates andashortening CRT in Amazonian 
forests (Table 2; Extended DataFig. 5).Furthermore, given thatlosses 
appear to lag behind gains, they should relateto the long-termCRT of 
plots. Thisis whatwe find: the longer the CRT the smaller the increase 
incarbon losses, withno increase inlosses for plots with CRT>77 years 
(Extended Data Fig. 6). Consequently, owing to the typically longer 
CRT of African forests, increasing losses in Africa ought to appear 
10-15 yearsafter the increase in Amazon losses began (around 1995). 
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significant trends and dashed lines show non-significant trends calculated 
usinglinear mixed effects models with census intervals (n=1,566) nested 
within plots (n=565),usingan empirically derived weighting based oninterval 
Jengthand plot area, on the untransformed pooled Africa and Amazon dataset 
(see Methods). Slopesand Pvaluesare fromthesame linear mixedeffects 
‘models. Carbon loss dataand models re presented untransformed for 
comparison with carbon gains, buttransformation isneeded tofit normality 
assumptions; performing linear mixed effects models on transformed carbon 
Joss datadoesnot change the presented significance trends, nor does 
including all three parameters and transformed data in amodel (see Extended 
DataTable 1). 


Strikingly, in Africa the most intensely monitored plots suggest that 
losses began increasing from about 2010 (Extended Data Fig. 7), and 
plots with shorter CRT are driving the increase (Extended DataFig. 8). 
Thus,amortality-dominated decline of the African carbon sink appears 
to have begun very recently. 


Future of the tropical forest carbon sink 

Our carbongainand loss models (Table 2) can be used tomakea tentative 
estimate of the future size of the per unit areaintact forest carbon sink 
(Fig. 3). Extrapolations of the changes in the predictor variables from 
1983-2015 forward to 31 December 2039 (Extended Data Fig. 5) show 
declinesin the sinkon both continents (Fig.3).By2030 the carbon sinkin 
abovegroundlive biomassin intact African tropical forests predictedto 
decline by 14% fromthe measured 2010-15 meanto0.57MgCha"'yr"(20 
range, 0.16-0.96;Fig. 3). The Amazon sink continues to rapidly decline, 
reaching zero in 2035 (2 range, 2011-2089; Fig. 3). Our estimated sink 
strength on both continents inthe 2020sand 2030sis sensitive to future 
CO, emissions pathways (CO,-change)*, resulting temperatureincrease 
(MAT, MAT-change) and hydrological changes (MCWD), plus changes 
in forest dynamics (CRT), but the sink is always lower than levels seen 
inthe 2000s (see Methods and Supplementary Table). Therefore, the 
carbonsinkstrength of the world’stwo most extensive tropical forests 
have now saturated, albeit asynchronously. 
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Table 2 | Minimum adequate models to predict carbon gains and losses in African and Amazonian forests 


Carbon gains (Mg C ha” yr") 


Predictor variable Parameter value ‘Standard error tvalue Pvalue 2000-2015 change in gains (%)" 
Intercept 5.255 | 5.395 0.603 | 0.614 87/88 <0.001 > 
CO,-change (ppmyr> 0238 0096 25 018 3.69% 13.71% 
MAT CO) ~0088 0025 =33 ‘001 -0.67%|-107% 
MAT-change (°C yr)? 71.243 0.233 53 <0.001 0.58% | 0.00%* 
MGW (mm *1,000) ~0.405|-1.391 0.381/0.24 =a-58 02891<0.001_-052%|-2.73% 
Wood density (g cm”) 71.295 0.530 724 0.015 0.05% | 0.00% 
Carbon losses (Mg C ha'yr")* 
Predictor variable Parameter value ‘Standard error tvalue Pvalue 2000-2015 change in losses (%)" 
intercept 1216 086 741 =0001 = 
CO;-change (ppmyr> 0130 0059 22 0026 11.38% [14.81% 
MAT-change (°C yr”)° 0.766 0162 Al <0.001 71.56% | 0.00% 
MCWD (mm = 1,000) 70.232 0107 72.2 0.030 71.21% | -2.42% 
CRT (years) ~0008 ‘001 “61 =0001 -0.57%6)1.39% 


This table shows the bestranked gains and loss models. Where continental values difer those for Africa are reported firs, followed by’ then the Amazonian values. 
“The January 2000 to 31 December 2014 change in gains/losses for each predictor variable was estimated allowing only the focal predictor to vary this change was then expressedas.a 


percentage ofthe annual gains/losses in the year 2000, allowing all predictors to vary. 
change over the past $6 years (see Extended Data Fig. 2). 
“Change over the past 5 years (see Extended Data Fig. 3) 


“A positive value for Africa indicates that MAT increased more slowly over 2000-2015 compared to the mean increase over 1983-2015, therefore contributing to an increase in gains; a 2610 
value for Amazonia indicates thatthe rate of MAT increase was the same over 2000-2016 as the mean increase over 1983-2016. 
“Carbon loss values were normalized via power-law transformation, with power parameter A= 0.361 


Scaling results to the pan-tropics 

Scaling our estimated mean sink strength by forest area for each 
continent signifies that Earth has passed the point of peak carbon 
sequestration into intact tropical forests (Table 1). The continental 
sink in Amazonia peaked in the 1990s, followed by a decline, driven 
by sink strength peakingin the 1990sand a continued decline in for- 
estarea (Table 1). In Africa the per unit area sink strength peaked later, 
in the 2000-2010 period, but the continental African sink peaked in 
the 1990s, owing to the decline in forest area in the 2000s outpacing 
the small per unit area increase in sink strength. Including the modest 
uptake inthe much smaller area of intact Asian tropical forest’ indicates 
that total pan-tropical carbon uptake peaked in the 1990s (Table 1). 
From the peak pan-tropical intact forest uptake of 1.26 Pg C yrinthe 
1990s, weprojecta continued decline reaching just 0.29 PgC yr ‘inthe 
2030s (multi-decade decline of -0.24 Pg C yr per decade), driven by 
(1) reduced mean pan-tropical sink strength decline of 0.1MgCha"yr* 
per decade and (2) ongoing forestarealosses of -13.5 million ha yr" (see 
Extended Data Table 2 for forestarea details). Critically, climate-driven 
vegetation model simulations have not predicted that the peak net 
carbon uptake into intact tropical forests has already been passed***. 


Discussion 

Our method of scaling to arrive at a pan-tropical sink estimate—in 
common with other studies using similar datasets'*"*—is limited. Yet, 
pervasive netcarbon uptake is expected given thatwe finda strongand 
ongoing CO, fertilization effect. Using our CO, response in Table 2, we 
find an increase in aboveground carbon stocks of 10.8 +3.7 Mg Cha 
per 100 ppm CO,, equivalent to 6.5 + 2.2% (+standard error; using 
an area-weighted pan-tropical mean aboveground carbon stock of 
165 MgC ha"). This is comparable to the 5.0+ 1.2% increase in tropical 
forest C stocks per 100 ppm CO, derived fromarecentsynthesisof CO, 
fertilization experiments, despitea ack of data from old-growth tropi- 
cal forests”. Our result is within the range of climate-driven vegeta- 
tion models””, although it is greater than results from a number of 
recently published models that include potential nutrient constraints, 
reported as 5.9 + 4.7 Mg Cha" per 100 ppm CO, (ref. ”). We find that 
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the CO; fertilization-driven uptake is currently only partially offset by 
the negative impacts of similarly widespread rising air temperatures 
(-2.0£0.4MgCha""°C*, from Table2), consistent with models’, limited 
experiments” and independent observations’, plus well-documented 
negative responses to drought”, Long-term and extensive increases 
in satellite-derived greenness in tropical regions that have not expe- 
rienced major changes in land-use management”, particularly in 
central Africain the past decade“, indicateincreases in tropical forest 
net primary productivity, providing further evidence that thesinkisa 
widespread phenomenon. 

Nonetheless, our analyses suggest that this pervasive intact tropical 
forestsinkin live biomassisinlong-term decline, having peaked first in 
Amazonia, and more recently followed by African forests, explaining 
the prior Africa~Amazon carbon sink divergence as part of a longer 
term pattern of asynchronous saturation and decline. Over time, the 
continued CO, fertilization effect is being increasingly counteracted 
by theimpacts of higher temperatures and droughts on tree growth 
and mortality, which are modulated by internal forest dynamics, with 
forests with the shortest CRT saturating first. 

Froman atmospheric perspective, the full impacts of the contri- 
bution to the saturation of the sink from slowing carbon gains are 
experienced immediately, but the contribution from rising carbon 
losses is delayed because dead trees do not decompose instantane- 
ously. Decomposition of this dead tree mass is about half complete 
in4 years, and about 85% complete in 10 years, so rising carbon losses 
result in delayed carbon additions to the atmosphere. Hence, from 
anatmospheric perspective, the intact tropical forest biomass carbon 
sink probably peaked a few years later than our inventory data indi- 
cate and the fullimpactsarenot yet realized. The pan-tropical carbon 
sink in live biomass declined by 0.27 Pg Cyr between the 1990s and 
2000s (Table 1), but accounting for dead wood decomposition** shows 
asmaller 0.17 Pg Cyr reduction from an atmospheric perspective 
(see Methods). 

Given that the overall global terrestrial carbon sink is increasing, a 
weakening intact tropical forest sink implies that the extra-tropical 
carbon sink has increased over the past two decades. Independent 
observations of interhemispheric atmospheric CO, concentration 
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Fig.3|Modelled pastand future carbon dynamics of structurally intact old- 
growth tropical forestsin Africaand Amazonia.a-f, Predictions of net 
aboveground live biomass carbon (a, é), carbon gains (b, €), and carbon losses 
(€.f),for African (left panels) and Amazonian (right panels) plot inventory 
networks, based on CO,-change, MAT, MAT-change, drought (MCWD), plot 
wood density, and plot CRT, using observations in Africauntil31 December 
2014 and Amazonia until mid-2011, and extrapolations of prior trendsto 


indicates that carbon uptake into the Northern Hemisphere landmass 
has increased at a greater rate than the global terrestrial carbon sink 
since the1990s, witha further disproportionate increasein the 2000s”, 
The interhemispheric analysis suggests a weakening of the tropical 
forest sink by -0.2Pg Cyr between the 1990s and 2000s”, which is 
similar to the 0.17 Pg Cyr weakening over the same time period that 
we find. This reinforces our conclusion that the intact tropical forest 
carbon sink has already saturated. 

Insummary, our results indicate that although intacttropical forests 
remain major stores of carbon andare key centres of biodiversity", their 
ability to sequester additional carbon in trees is waning. In the 1990s 
intact tropical forests removed 17% of anthropogenic CO, emissions. 
This declined to an estimated 6%in the 2010s, because the pan-tropical 
weighted average per unit area sink strength declined by 33%, forest 
area decreased by 19%and anthropogenic CO, emissions increased by 
46%. Although tropical forests are more immediately threatened by 
deforestation and degradation”, and the future carbon balance will 
also depend on secondary forest dynamics" and forest restoration 
plans”, our analyses show that they are also affected by atmospheric 
chemistry and climatic changes. Given that the intact tropical forest 
carbonsinkis setto endsooner than even the most pessimistic climate- 
driven vegetation models predict**, our analyses suggest that climate 
changeimpactsin the tropicsmay becomemore severethan predicted, 
Furthermore, the carbon balance of intact tropical forests will only 
stabilize once CO, concentrations and the climate stabilizes. 

Continued on-the-ground monitoring of theworld’s remaining intact 
tropical forests will be required to test our prediction that the carbon 
sinkinlive treeswill continue to decline, particularly as future changes 
inthe tree species composition may alter the resilience of the sinkand 


31 December 2039, Model predictions are in blue (Africa) and brown (Amazon), 
with olid lines spanning the window when >75% of plots were monitored to 
show model consistency with the observed trends, and shadingshowing upper 
and lower confidence intervals accounting for uncertainties in the model (both 
fixed and random effects) and uncertainties in the predictor variables. Light 
grey ines and grey shading are the mean and 95% Clofthe observations from 
the Africanand Amazonian plotnetworks. 


because we cannot exclude the possibility of decadal-scale climate 
impacts on these forests. Such direct ground-based measurements 
also provide a constraint on estimating the size, location and climate 
sensitivity of the terrestrial carbon sink. In addition, our conclusion 
thattree mortality and internal forest dynamics are important controls 
onthe future of the tropical forest carbon sink may assistinimproving 
the vegetation components of Earth System Models” and contribute to 
reducing terrestrial carbon cycle feedback uncertainty”. Our findings 
also have policy implications. Atthe individual country level, given that 
intact tropical forests are a carbon sink but the rate of reduction will 
differ continentally and probably regionally (for example, aseasonal 
Amazon forestsare less affected by droughts), national greenhouse gas 
reporting will require careful forest monitoring. At the international 
level, given that tropical forests are likely to sequester less carbonin 
the future than Earth System Models predict, an earlier date by which to 
reach netzero anthropogenic greenhouse gasemissionswill be required 
tomeet any given commitmentto limit the global heating of Earth. 
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Methods 


Plotselection 

Closed canopy (thatis, not woody savannah) old-growth mixed-age 
forest inventory plots were selected using commonly used crite- 
structurally intact (thatis, free of fire and industrial logging); 
all trees with diameter at reference height 2100 mm measured at least 
twice; area 20.2 ha; altitude <1,500 m above sea level; MAT >20.0°C*; 
annual precipitation 21,000 mm*; located 250 m from anthropogenic 
forest edges. Of the 244 plots included in the study, 217 contribute 
to the African Tropical Rainforest Observatory Network (AfriTRON; 
www.afritron.org), with data curated at www-ForestPlots.net™*. These 
include plots from Sierra Leone, Liberia, Ghana, Nigeria, Cameroon, 
Gabon, Republic of Congo, Democratic Republic of Congo, Uganda and 
Tanzania** (Extended Data Fig. 1). Fifteen plots are part of the TEAM 
network, from Cameroon, Republic of Congo, TanzaniaandUganda*”. 
Nine plots contribute to the ForestGEO network, from Cameroon and 
Democratic Republic of Congo* (9 plots from Democratic Republic of 
Congo, with codes SNG, contribute to both AfriTRON and ForestGEO 
networks, included above in the AfriTRON total). Finally, three plots 
from Central African Republic are part of the CIRAD network®*. The 
large majority of plots are sited in terra firme (not inundated by river 
water) forestsand have mixed species composition, although four arein 
seasonally flooded forestand 14 plots are in Gilbertiodendron dewevrei 
monodominant forest, alocally common forest type in Africa (Supple- 
mentary Table1). The 244 plots have amean size of 1.1ha (median, 1ha), 
witha total plotarea of 277.9 ha. The dataset comprises 391,968 diam- 
eter measurements on 135,625 stems, of which 89.9% were identified 
to species, 97.5% to genusand 97.8% to family. Mean total monitoring 
period is 11.8 years, mean census length 5.7 years, with a total of 3,214 
hectare years of monitoring. The 321 Amazon plotsare published and 
were selected using the same criteria*, exceptin the African selection 
criteria we specified a minimum anthropogenic edge distance and 
added a minimum temperature threshold. 


Plot inventory and tree biomass carbon estimation 

Tree-level aboveground biomass carbon is estimated usinganallomet- 
ricequation with parameters for tree diameter (in mm), tree height (in 
m) and wood mass density (in gcm*)*. The calculation of each is dis- 
cussed in turn. All calculations were performed using the R statistical 
platform, version 3.2.1 (ref.*) using the BiomasaFP R package, version 
0.2.1(ref.). 


Tree diameter. Inall plots, all woody stems with>100 mm diameter at 
1.3 mfrom thebase of the stem (‘diameter at breastheight’, DBH, inmm), 
or 0.5 mabove deformities or buttresses, were measured, mapped and 
identified using standard forest inventory methods**. The height of 
the point of measurement (POM) wasmarked on thetreesandrecorded, 
so that thesame POMisused at the subsequent forest census. Forstems 
developing deformities or buttresses over time that could potentially 
disturb the initial POM, the POM was raised approximately 500 mm 
above the deformity. Estimates of the diameter growth of trees with 
changed POM used the ratio of new to old POMs, to createa singletra- 
jectory of growth from the series of diametersat two POM heights****. 
We used standardized protocols to assess typographical errors and 
potentially erroneous diameter values (for example, trees shrinking 
by>Smm), missing values, failures to find the original POM, and other 
issues. Where necessary we estimated the likely value via interpolation 
or extrapolation from other measurements of that tree, or when this 
was not possible we used the median or mean growth rate of treesinthe 
same plot, census and size-class. Weused the median growth rate for size 
classes of DBH = 100-199 mm and 200-399 mm. We used the mean 
growth rate for a size class with DBH > 400 mm, as there were fewer 
treesin the largest size class*. We interpolated measurements for 1.3% 
of diameters, extrapolated 0.9%, and used median growth rates for 15%. 


Tree height. Height of individuals from ground tothe top leaf, hereafter 
H,, was measured in 204 plots, using laser hypsometer (Nikon forestry 
Pro) from directly below the crown (most plots), a laser or ultrasonic 
distance device with an electronic tilt sensor, a manual clinometer, or 
by direct measurement, thatis, climbing the tree. Only trees where 
the top was visible were selected. In most plots, tree selection was 
similar:the 10 largesttreeswere measured, together with 10 randomly 
selected trees per diameter from five classes:100-199 mm, 200-299 
mm,300-399mm, 400-499 mm, and 500+ mm trees, following stand- 
ard protocols, We measured the actual height of 24,270 individual 
trees from 204 plots. We used these dataand the local.heights function 
in R package BiomasaFP® to fit 3-parameter Weibull relationships: 


H,=a(l—e8) @ 


We chose the Weibull model (with Weibull parameters a, band c) 
because it is known to be robust*”, We parameterized separate H,- 
DBH relationships for four different combinations of edaphic forest 
type and biogeographical region: (1) terra firme forest in West Africa, 
(2)terrafirme forest in Lower Guinea and the Western Congo Basin, (3) 
terra firme forest in Eastern Congo Basin and East Africa, (4) seasonally 
flooded forest from Lower Guineaand the Western Congo Basin (there 
were no seasonally flooded forest plots in the other biogeographi- 
cal regions). The parameters are: (1) terra firme forest in West Africa, 
a=56.0;b=0.0401;c= 0.744; (2) terrafirme forestin Lower Guinea and 
the Western Congo Basin, a 0.755; (3) terra firme 
forestin the Eastern Congo Basin and East Africa, a=50.8; b= 0.0499; 
c=0.706;and finally (4) seasonally flooded forest from Lower Guinea 
and the Western Congo Basin, a=38.2; b= 0.0605; c=0.760. For each 
of these combinations of forest type and bioregion, the local.heights 
function combinesall height measurements fromall plots belonging to 
that forest type/bioregion and fits the Weibull model parameters using 
nonlinear least squares (nls function in R with default settings), with 
starting values of a= 25, b= 0.05 and c= 0.7 chosen because they led 
to regular model convergence. We fitted these models either treating 
each observation equally or with weights proportional to each tree's 
basal area. These weights give more importance to large trees during 
model fitting. Weselected the best fitting of thesemodels, determining 
this to be the model that minimized prediction error of stand biomass 
when calculated with estimated heights or observed heights. In this 
way, we selected the non-weighted model for terra firme forests in 
Lower Guinea/Western Congo Basin and for flooded forestsin theLower 
Guinea/Western Congo Basin; we selected the weighted model for the 
other two biogegraphical regions (West Africa and Eastern Congo Basin 
East Africa). Theparameters were used to estimate H, from DBH forall 
tree DBH measurements for inputinto the allometric equation. Median 
measured individual total tree heightis 20.5 m; the height range is 3.1 
t072.5m. The root meansquared error (RMSE) between the full dataset 
of measured heights and the predicted heights is 5.7 m, which is 8.0% 
of the total range. Furthermore, RMSEis 5.3 min terra firme forest in 
West Africa (7.5% of the range; n = 9,771 trees); RMSEis 6.4 min terra 
firme forestin Lower Guinea and the Western Congo Basin (8.7% of the 
range; n=10,838trees); RMSE is 4.8 min terra firmeforestin theEastern 
Congo Basin and East Africa (8.8% of the range; n= 3,269 trees); and 
RMSE is 4.1 min seasonally flooded forest from Lower Guineaand the 
Western Congo Basin (12.5% of the range; n= 392 trees). 


Wood density. Dry wood density (o) measurements were compiled 
for 730 African species from published sources and stored in www. 
ForestPlots.net; most were sourced from the Global Wood Density 
Database on the Dryad digital repository (www.datadryad.org)**. 
Each individual in the treeinventory database was matched toa taxon- 
specific mean wood density value. Species in both the tree inventory 
and wood density databases were standardized for orthography and 


synonymy using the African Plants Database (www.ville-ge.ch/cjb/bd/ 
africa/)to maximize matches”. For incompletely identified individuals 
or for individuals belongingto species notin thep database, we used the 
mean p value for thenext-highest known taxonomic category (genus or 
family, as appropriate). For unidentified individuals, we used the mean 
wood density value ofall individual trees in the plot**. 


Allometric equation. For each tree we used a published allometric 
equation“ to estimate aboveground biomass. We then converted this 
to carbon, assuming that aboveground carbon (AGC, in MgC ha") is 
45.6% of aboveground biomass”. Thus: 


AGC = 0.456 x (0.0673 x (p x DBH?x H)°9"*)/1,000, 2) 
with DBH in mm, dry wood density pin gcm®, and otal tree height H, 
in m (ref.). Aboveground carbon in living biomass for each plot at 
each census date was estimated as the sum of the AGC of each living 
stem, divided by plotarea (in hectares). 


Carbon gain and carbon loss estimation 

Net carbon sink (in MgC ha" yr) is estimated as carbon gains minus 
carbon losses. Carbon gains (in MgC ha 'yr)are the sum of theabove- 
ground live biomass carbon additions from the growth of surviving 
stems and the addition of newly recruited stems (recruits are stems 
reaching a DBH > 100 mm during a given census interval), divided 
by the census length (in years) and plot area (in hectares). For each 
stem that survived acensusinterval, carbon additions from its growth 
(MgCha yr) were calculated asthe difference between its AGC atthe 
end census of the interval and its AGC at the beginning census of the 
interval. For each stem that recruited during the census interval (that 
is, reaching DBH > 100 mm), carbon additions were calculated in the 
same way, assuming DBH = 0 mmat thestart of the interval, follow- 
ing standard procedures**. Carbon losses (in MgC ha"'yr") are esti- 
matedas the sum of aboveground biomass carbon fromalll stems that 
died duringa census interval, divided by the census length (in years) 
and plot area (in hectares). Both carbon gains and carbon losses are 
calculated using standard methods*, including a census interval bias 
correction, using the SummaryAGWP function of the R package Bio- 
masaFP°**, 

As carbon gains (and losses, see below) are affected by a census 
interval bias, with the underestimate increasing with census length, 
we corrected this bias by accounting for (1) the carbon additions from 
trees that grew before they died within an interval (unobserved growth) 
and (2) the carbonadditions from trees that reached 100 mm DBH (that 
is, were recruited) and then died within thesame interval (unobserved 
recruitment)®”. 

The first component, the unobserved growth of a stem that died 
during a census interval, is estimated as the difference between AGC 
at death and AGCat the start of the census. Theseare calculated using 
equation (2), from DBH,,,,and DBH,,,,., respectively. Thelatteris part 
of the data, the first can be estimated as: DBH gear, = DBHsay.% G* Year 
where Gis the plot level median diameter growth rate (inmmyr") ofthe 
size class the tree wasin at the start of the census interval (size classes 
are defined as DBH<200mm, 400mm>DBH>200mmandDBH>400 
mm) and Yess is the mean number of years that trees survived in the 
censusinterval before dying. ¥ne,,i8calculated from thenumber of trees 
that are expected to have died ineach year of the census interval, which 
is derived from the plot-level per capita mortality rate (m,;as percent- 
age of dead trees per year) calculated following equation (5) in ref.”. 

The second component, the growth of recruits that were not 
observed because they died during the census interval, is estimated 
bycalculatingthenumber of unobserved recruits and diameter at death 
for each unobserved recruit. The number of unobserved recruits ina 
given year (stems ha''yr") is estimated as: N,,,=R, ~ Pau * Ry, where R, 
(number of recruited stems ha *yr)isthe per-area annual recruitment 


calculated following equation (11) inref.”andP,,,, isthe probability of 
each recruitsurvivinguntil the next census: P,,=(1-m,)", where Tisthe 
number of years remainingin the censusinterval. SummingN,,,for each 
year ina census interval gives the total number of unobserved recruits 
in that census interval. We then estimate diameter at death for each 
unobserved recruit, which is given in millimetres by DBHexnur=100 + 
(G,Ynean se)» Where G, is the plot-level median diameter growth rate (in 
mmyr") of thesmallest size class (that is, DBH <200 mm) and Yneansee 
isthe mean number of years that unobserved recruits survived in the 
census interval before dying. YneanieciS calculated as follows: from m, 
wecancalculate the number of recruitsin agiven year that diedin each 
subsequent year, and from this calculate the mean lifespan of recruits 
inagiven year that died beforethenext census; Yneaniecis then themean 
ofeach year’s recruit-lifespan, weighted by the number of unobserved 
recruitsin each year. 

Thecensusinterval bias correction (componentsone and twocom- 
bined) typically add <3% to plot-level carbon gains calculated for each 
plot census interval. Carbon losses are affected by the same census 
interval bias, sowe corrected this biasby accounting for the additional 
carbon losses from the trees that were recruited and then died within 
the same interval, and theadditional carbon losses resulting from the 
growth of the trees that died in theinterval*. These two components 
are calculated in the same way as for carbon gains and typically add 
<3% to plot-level carbon losses. 

Carbon gains include both gains from the growth of surviving stems 
and new recruits. Separating carbon gains from thetreegrowth of sur- 
viving stems and newly recruited stems shows that carbon gains from 
recruitmentare small overall, and are significantly lower in Africathan 
in the Amazon (in Africa, 0.17 Mg Cha" yr; Cl: 0.16-0.18 versusin the 
Amazon, 0.27MgCha‘*yr;Cl:0.25-0.28, P< 0.001; two-way Wilcoxon 
test), but this is compensated by carbon gains from survivors being 
significantly larger in Africa (2.33 Mg C ha" yr“; Cl: 2.27-2.39) than in 
the Amazon (2.13 MgC ha" yr*; Cl: 2.09-2.17, P= 0.014). Therefore, 
gains overall (sum of gains from surviving stems and newly recruited 
stems) are indistinguishable between the continents (in Africa, 
2.57Mg Cha‘ yr"; Cl:2.51-2.67 versusin the Amazon, 2.46MgCha"yr"; 
.41-2.50, P= 0.460; two-way Wilcoxon test). The lower carbon 
gains from recruitment in Africa are probably due to the lower stem 
turnover rates and longer CRT. 


Long-term gain, loss and net carbon sink trend estimation 
The estimated mean and uncertainty in carbon gains, carbon losses and 
thenet carbon sink of the African plots from 1January 1983 to 31 Decem- 
ber 2014 (Fig. 1, Extended Data Fig. 7 and Extended Data Fig. 8) were 
calculated following ref. °to allow direct comparison with published 
Amazonian results. First, each census interval value wasinterpolated for 
each 0.1-year period within the census interval. Then, for each 0.1-year 
period between I January 1983 and 31 December 2014, we calculated 
aweighted mean ofall plots monitored at that time, using the square 
root of plotarea as a weighting factor®. Confidence intervals foreach 
0.1-year period were bootstrapped. 
Trendsin carbon gains, losses and thenet carbon sink over time were 
assessed using linear mixed effects models (Imer function in R, Ime4 
package”), providing the linear slopes reportedin Fig. 1. Thesemodels 
regress the midpoint of each census interval against the value of the 
response variable for that census interval. Plot identity was included 
asarandomeffect, thatis, by assuming that the intercept can varyran- 
domly among plots. We did not includeslopeasarandom effect, con- 
sistent with previously published Amazon analyses®, because models 
did not converge owing to some plots having too few censusintervals. 
Observations were weighted by plot size and census interval length. 
Weighting for the Africa data was derived empirically, by assuming 
apriorithat thereisno significantrelation between thenet carbon sink 
and census interval length or plotsize, following ref... The following 
weightingremovesall patternin the residuals: 
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Weight = 3/fength,,, + 4/plotsize -1 8) 
where lengthi,cis the length of the censusinterval, in years. Significance 
was assessed by regressing the residuals of thenet carbon sink model 
against the weights (P= 0.702). Similar published weighting was used 
for the Amazon plots 

Differences in long-term slopes between the two continents for 
carbon gains, carbon losses and net carbon sink, reportedin themain 
text, were also assessed using linear mixed effects models and weight- 
ing, as described above, but performed on the combined African and 
Amazonian datasets and limited to their common time window, 1Janu- 
ary 1983 to mid-201L. For these three tests on the pooled data (gains, 
losses and net sink) we included an additional interaction term between 
census interval date and continent, where a significant interaction 
would indicate that the slopes differ between continents. The statisti- 
cal significance of continental differences in slopewere assessed using 
the F-statistic (ANOVA function in R, car package”). Shortening the 
common time window to the 20 years when the continents are best- 
sampled, mid-1991 to mid-2011, gave very similar results, including a 
divergent continental sink (P= 0.04). 


Continental and pan-tropical carbonsink estimates 

Theper unit area total net carbon sink (in MgC ha yr) for each time 
period in Table 1 (each decade between IJanuary1980and 31 December 
2009; and between 1January 2010 and 31 December 2014) is the sum 
of three components. The first component is the per unit area above- 
ground carbon sink from living trees and lianas with DBH > 100 mm. 
For Africa we use the per unit area net carbon sink values presented 
in this paper. For Amazonia, we use data in ref.°, For Southeast Asia, 
we use inventory data collected using similar standardized methods 
from49 plotsin ref... For each time window, we useall plots for which 
census dates overlap the period, weighted by the square root of plot 
area, as for the solid lines in Fig. 1. Thesecond components the per 
unit area aboveground carbon sink from living trees and lianas with 
DBH<100mm. Thisis calculated as 5.19%, 9.40% and 5.46% of the first 
component (thatis, aboveground carbon of large living trees) in Africa, 
Amazonia and Southeast Asia respectively”. The third componentis 
the per unitarea belowground carbon sinkin live biomass, thatis, roots. 
Thisis calculated as 25%, 37% and 17% of the aboveground carbon of 
living trees with DBH 2100 mmin Africa”, Amazonia’ and Southeast 
Asia’® respectively. 

For each time period in Table 1 we calculated the continental-scale 
total carbon sink (Pg C yr") by multiplying the per unit area total net 
carbon sink described above by the area of intact forest on each con- 
tinent at that time interval (in ha) reported in Extended Data Table 2. 
Decades are calculated from 1 January 1990 to 31 December 1999, For 
comparability with previous continental-sink results, we used conti- 
nental values of intact forest area for 1990, 2000, 2005 and 2010 as, 
published in ref., thatis, total forest area minus forest regrowth. We 
used the1990-2010 datato fitan exponential model for each continent 
and used this model to estimate intact forestarea for 1980 and 2015. 

Finally, in the main text we calculated the proportion of anthro- 
pogenic CO; emissions removed by Earth's intact tropical forests, as 
the total pan-tropical carbon sink from Table 1 divided by the total 
anthropogenic CO, emissions. Total anthropogenic CO, emissions 
are calculated as the sum of emissions from fossil fuel and land-use 
change and are estimated at 7.6 Pg yr“ in the 1990s, 9.0 PgC yr*in 
the 2000s, and11.1 PgC yr“in the 2010s (ref. , assuming 1.7% growth 
in fossil fuel emissionsin 2018 and 2019, and mean 2010-2017 land-use 
change emissions for 2018 and 2019). 


Carbonsink from an atmospheric perspective 
To estimate the evolution of the carbon sink from an atmospheric 
perspective, we assumed that the contribution to the atmosphere 


from carbon gains are experienced immediately, while the contribu- 
tion to the atmosphere from carbon losses must take into account 
the delay in decomposition of dead trees. We did this by calculating 
total forest carbon loss (Mg C ha" yr") for each year in the period 
LJanuary 1950 to 31 December 2014, using the mean 1 January 1983 to 
31 December 2014 records from Fig. Land assuming constant losses 
before 1983 (1.9 MgChayr!and1.5 MgC ha yr" for Africa and Ama- 
zoniarespectively). Then, for each focal year in the period 1950-2014, 
wecalculated how much carbon was released to the atmosphere inthe 
subsequent yearsas follows: y,=x9 x €°"""9 —x, xe, wherex, isthe 
total forest carbon loss of the focal year;y,is the carbon released to the 
atmosphere at ‘years from the focal year; and -0.17 yr isa constant 
decomposition rate calculated for tropical forests in the Amazon“. 
For example, carbon loss was 1.95 Mg C hain 1990 in African forests 
(Fig. 1), from which 0.31 Mg C ha" was released to the atmosphere in 
1991; 0.26 Mg Cha‘in 1992; 0.22 MgC hain 1993; 0.07 Mg Chain 
2000 and 0.01MgCha‘'in 2010. Hence, of the full 1.95 MgC ha*dead 
tree biomass from 1990, -50% was released to the atmosphere after 4 
years, -85% after 10 years, and -97%after 20 years. Finally, for each year 
between 1983.and 2014, the total contribution to theatmosphere from 
carbon losses was calculated as the sum of all carbon contributions 
released at that year, including all carbon loss pools from previous 
years that are released during the focal year (an approach similar to 
ref. ). We then calculated decadal-scale mean contributions to the 
atmosphere from carbon losses to estimate the carbon sink from an 
atmospheric perspective, reported in the main text. 


Predictor variable estimates (1983-2015) 
For each censusinterval of each plot, we examined potential predictor 
variables that may explain the long-term trends in carbon gains and 
carbon losses, reported in Table 2 and Extended Data Table 1. First, 
the environmental conditions during the census interval; second, the 
rate of change of these parameters; and third, forest attributes that 
may affect how different forests respond to the same environmental 
change. The predictor variable estimates for each census need toavoid 
bias due to seasonal variation, for example the intra-annual variability 
in atmospheric CO, concentration. We therefore applied the following 
procedure to avoid seasonal variability impacts on long-term trends: 
(@) the length ofeach focal censusinter val was rounded to thenearest 
complete year (for example, a1.1-year interval becameal year interval 
(2) we computed dates that minimized the difference between actual 
fieldwork dates and complete-year census dates, while ensuring that 
subsequent census intervals of a plot do not overlap. The resulting 
sequence of non-overlapping census intervals was used to calculate 
interval-specific means for each environmental predictor variable to 
remove seasonal effects. The mean difference between the actual field- 
work dates and the complete-year census dates is 0.13 decimal years. 
Thefirst group of potential predictor variables, estimated for each 
census interval of each plot, are theory-driven choices: atmospheric 
CO, concentration, MAT and drought intensity, which we quantified 
as MCWD"207677, 


Atmospheric CO, concentration. CO; (in ppm) is estimated as the 
mean ofthe monthly mean values from the MaunaLoa record” over the 
complete year censusinterval. While atmospheric CO. concentrationis 
highly correlated with time (R?= 0.98), carbon gainsare slightly better 
correlated with CO, (R,4?= 0.0027) than with time (R,4?= 0.0025), as 
expected from theory. 


Mean annual temperature. MAT (in°C) was derived from the tempo- 
rally resolved (1901-2015) dataset of monthly mean temperature from 
the Climatic Research Unit (CRUTS version 4.03;-3,025-km' resolution; 
released 15 May 2019; https://crudata.uea.ac.uk/cru/data/hrg/)”. We 
downscaled the data to-I-km? resolution usingtheWorldClim v2 data- 
set™°, by subtracting the difference in mean monthly temperature, 


and applying this monthly correction to all months". We then calcu- 
lated MAT for each complete year census interval of each plot using 
the downscaled monthly CRU record. 


Maximum climatological water deficit. MCWD (inmm) was derived 
fromthe-3,025-km*resolution Global Precipitation Climatology Centre 
dataset (GPCC version 6.0) that includes many more rain gauges than 
CRUin tropical Africa. Because GPCC ends in 2013 we combined it 
with satellite-based Tropical Rainfall Measurement Mission data (TRMM 
3843 V7 product, -757-km* resolution), The fitfor the overlappingtime 
period (1998-2013) was used to correct any systematic difference be- 
tween GPCC and TRMM:GPCC’ =a + b x GPC, with GPCC’ the adjusted 
GPCC record and aand b being different parameters for each month 
of the year and for each continent. Precipitation was then downscaled 
to-1-km* resolution using the WorldClim dataset™, by dividing by 
the ratio in mean monthly rainfall, and applying this monthly correc- 
tion to all months". For each census interval we extracted monthly 
precipitation values and estimated evapotranspiration to calculate 
monthly climatological water deficit (CWD),acommonly used metric 
of dry season intensity for tropical forests!*”*”, Monthly CWD values 
were calculated for each subsequent series of 12 months (complete 
years)”. Monthly CWD estimation begins with the wettest month of 
the first year in the interval, and is calculated as 100 mm per month 
evapotranspiration (ET) minus monthly precipitation (P). Then, CWD, 
values for the subsequent II months (i) were calculated recursively 
as: CWD,=ET ~ P, + CWD,.,, where negative CWD, values were set to 
zero” (no drought conditions). This procedure was repeated for each 
subsequent complete 12 months. We then calculated theannual MCWD 
as the largest monthly CWD value for every complete year within the 
census interval, with the MCWD of a census interval being the mean 
of theannual MCWD values within the census interval. Larger MCWD 
indicates more severe water deficits. 

Weassume evapotranspiration is 100 mm per month on both conti- 
nents, based on measurements from Amazonia”, morelimited meas- 
urements from West Africa summarizedin ref., predictive skill®, and 
use in past studies on both continents**”. MCWD therefore represents 
aprecipitation-driven dry season deficit, given thatevapotranspiration 
remains constant. Analternativeassessment, usinga data-driven evapo- 
transpiration product”, gave a mean evapotranspiration of 95 mm 
and 98 mm per month for the African and Amazonian plot networks 
respectively (mean for the 1982-2008 period). Using these values did 
notaffect the results. 

To calculate the environmental change of potential predictor vari- 
ables, CO,-change (in ppm yr"), MAT-change (in °C yr) and MCWD- 
change (in mm yr"), we selected an optimum period over which to 
calculate the change, derived empirically by assessing the correlation 
of carbon gains (all plots, all censuses) with the change in each envi- 
ronmental variable, using linear mixed effects models (Imer function 
inR, Ime4 package”). The annualized change in the environmental 
variable was calculated as the change between the focal interval and 
aprior interval (termed the baseline period) with a lengthening time 
window ranging from year through to 80 years before the focalinterval 
(thatis, 80 linear mixed effects models per variable). We calculated 
Akaike’s Information Criterion (AIC) for each model and selected the 
interval length with the lowest AIC. Thus, MAT-change = (MAT|~MAT,)/ 
(date, date,), where MAT, is the MAT over the focal census interval 
calculated using the procedure described above, MAT, isthe MAT over 
abaseline period before the focal interval, date,is the mid-date of the 
focal census interval and date, is the mid-date of the baseline period. 
The Imer results show that the baseline period for MAT-changeis 5 years 
and for CO,-change itis 56 years, while MCWD showed noclear trend, 
so MCWD-change was not included in the models (see Extended Data 
Fig. 3). All three results conform toa priori theoretical expectations. 
For CO, a maximum response to an integrated 56 years of change is 
expected because forest stands will respond most strongly to CO, when 


‘most individuals have grown under the new rapidly changing condition, 
which should beatitsmaximumata time approximately equivalent to 
the CRT of aforest stand” (mean of 62 years in the pooled dataset). 
For MAT, 5 yearsis consistent with experiments showing temperature 
acclimation of leaf- and plant-level photosynthetic and respiration 
processes over half-decadal timescales”. MCWD has no overall trend 
suggesting that once a drought ends, its impact on tree growth fades 
rapidly, as seen in other studies"**, Furthermore, in the moist tropics 
wet-season rainfallis expected to recharge soil water, so lagged impacts 
of droughts are not expected. 

We calculated estimates of two forest attributes that may alter 
responses to environmental change as potential predictor variables: 
wood density and CRT. Inintact old-growth forests, mean wood density 
(ingcm™)isinversely related to resourceavailability*””*, asisseenin 
our dataset (carbon gains and plot-level mean wood density are nega- 
tively correlated; Extended Data Fig. 4). Wood density is calculated for 
each census interval in the dataset, as the mean wood density of all trees 
aliveattheend of the census interval, to be consistent with the previous 
Amazon analysis’. Carbon residence time (CRT, in years) isa measure 
of the time that fixed carbon stays in the system. CRT is a potential 
correlate of the impact of past carbon gains on later carbon losses”. 
Toavoid circularity in the models, the equation used to calculate CRT 
differed depending on the response variable. If the response variable 
is carbon loss, the CRT equation is based on gains: CRT = AGC/gains, 
with AGC for each interval based on AGC at the end of theinterval, and 
the gains for each interval calculated as the time-weighted mean of 
the gains in the interval and the previous intervals (that is, long-term 
gains). If the response variable is carbon gains, the CRT equation is 
based on losses: CRT = AGC/losses. The equation employed for use in 
the carbon loss model (based on gains) is the standard formula used 
tocalculate CRT and is retained in the minimum adequate model (see 
belowand Table 2). The non-standard CRT equation (based on losses) 
usedinthe carbon gain model snot retainedinthe minimum adequate 
model (see below). 


Statistical modelling of the carbon gain, loss and sink trends 
Wefirst constructed two modelsincluding those environmental driv- 
ers exhibiting long-term change that impact theory-driven models 
of photosynthesis and respiration as predictor variables: CO,, MAT 
and MCWD. Onemodel had carbon gainsas the response variable, the 
other had carbon losses as the response variable (both in Mg Cha ‘yr’. 
Models were fitted using the Ime function in R, with maximum likeli- 
hood (NLME package”). All census intervals withinall plots wereused, 
weighted by plot size and census length (using equation (3)). Plotiden- 
tity was included as arandom effect, thatis, assuming thattheintercept 
can vary randomly among plots. All predictor variables in the models 
werescaled without centring (scale function inR, RASTER package”). 
Carbon gain values were normally distributed but carbon loss values 
required a power-law transformation (A= 0.361) to meet normality 
criteria. Multi-parameter models are: carbon gains =intcp + ax CO, + 
bxMAT+cx MCWD (model 1); carbonlosses=intep +a CO, +bxMAT + 
cxMCWD (model 2); where intep isthe estimated model intercept, and 
a, band caremodel parameters giving the slope of relationships with 
environmental predictor variables. For multi-parameter model outputs 
see Extended Data Table I, for single-parameter relationships, Fig. 2. 
The second pair of models include the same environmental pre- 
dictors (CO,, MAT, MCWD), plus their rate of change (CO,-change, 
MAT-change, but not MCWD- change, as explained above), and forest 
attributes thatmay alter how forests respond tothe same environmen- 
tal change (wood density, CRT), as describedabove. Wealso evaluated 
the possibleinclusion of adifferential continent effect of each variable 
in the full model. We first constructed models with only a single pre- 
dictor variable, and allowed different slopes in each continent. Next, 
if removal of the continent-specific slope (using stepAlC function in 
R, MASS package”) increased model AIC then the continent-specific 
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slope was included in the full model for that variable. Only MCWD 
showed a significant differential continent-specific slope (P< 0.001). 
Thisimplies that forests on both continents have common responses 
to CO2, CO»-change, MAT, MAT-change, wood density and CRT, but 
respond differently to differencesin MCWD. This may be because wet- 
adapted species are much rarerin Africa than in Amazonia asaresult of 
large differencesin past climate variation™. Last, weallowed different 
intercepts for the two continents to potentially account for differing 
biogeographical or other continent-specific factors. For the carbon 
loss model, we applied the same continent-specific effects for slope 
as for the carbon gain model. Carbon loss values were transformed 
usinga power-law transformation (A=0.361) tomeet normality criteria. 

For both carbon gains and losses we parameterized a global model 
including the significant continent-specific effect of MCWD, select 
ing the most parsimonious simplified model usingall-subsets regres 
sion”, To do so, we first generated a set of models with alll possible 
combinations (subsets) of fixed effect terms in the global model using 
the dredge function of the MuMin package in R”. We then chose the 
best-ranked simplified model based on the second-order Akaike Infor- 
mation Criterion (known as AICc), hereafter called the ‘minimum 
adequate carbon gain/loss model’, reported in Table 2. The minimum 
adequate models are: carbon gains=intcp x continent +a xCO,-change 
+bxMAT+cxMAT-change + dx MCWD x continent + ex wood density 
(model 3); carbon losses =intep + ax CO,-change + bx MAT-change + 
cxMCWD +d CRT (model 4). Wood density was retainedin the carbon 
gain model, probably because growthis primarily affected by resource 
availability, whereas CRT was retained inthe carbon oss model, prob- 
ably because losses are primarily affected by how long fixed carbonis 
retained in the system. 

Table 2 presents model coefficients of the best-ranked gain model 
and best-ranked loss model selected using all-subsets regression. 
These best-ranked gain and loss models have weights of 0.310 and 
0.132 respectively, which is almost double the weight of the second 
rank models (0.152 and 0.075 respectively). In Supplementary Table 2 
we also used the model.avg function of the MuMin package to calcu- 
late a weighted mean of the coefficients of the models that together 
represent a cumulative weight-sum of 0.95 (that is, a 95% confidence 
subset). Supplementary Table 2 (model-averaged) and Table 2 (best- 
ranked) model parameters are very similar. Supplementary Tables 3 
and 4 report the complete sets of carbon gains and loss models that 
contribute to the model average results. 

‘The model-average results show the same continental differencesin 
sensitivity to environmental variablesas the best-rankedmodels. From 
Aanuary 2000 to 31 December 2014, carbon gains increased owing to 
CO,-change (+3.7% in both the averaged and the best-ranked models, 
both continents), whereas temperature rises led toa decline in gains, 
which especially had an effect inthe Amazon (-1.14% and-1.07%due to 
MAT and MAT-change together in the averaged and best-ranked model 
respectively). Finally, both model-average and best-ranked models 
result in similar predictions of the net carbon sink over the January 
1983to31 December 2039 period: the future net sink trendin Africais 
~0.004 and -0.003 in the best-ranked and averaged models, respec- 
tively; in Amazonia the future net sink trend is -0.013 and -0.011 in 
the best-ranked and averaged models, respectively. The Amazon sink 
reaches zero in 2041 using model-averaged parameters compared to 
2035 using the best-ranked models. 


Estimating future predictor variables to 2040 
To calculate future modelled trends in carbon gains and losses 
(Fig. 3), we first estimated annual records of the predictor variables 
(CO,-change, MAT, MAT-change, MCWD, wood density and CRT) to 31 
December 2039 (Extended Data Fig. 5). 

To doso, we first calculated annual records for the period of the 
observed trends for each plot location (thatis, from LJanuary 1983031 
December 2014 in Africaand 1 January 1983 to mid-2011 in Amazonia). 


For CO,-change, MAT, MAT-change and MCWD we extracted monthly 
records as described in the Methods section ‘Predictor variable esti- 
mates (1983-2014). For wood density and CRT we interpolated toa 
1-year period within each census interval (asin Fig. 1). Then, we cal- 
culated the mean annual value of each predictor variable fromthe 244 
plot locations in Africa, and separately the mean annual value of each 
predictor variable from the 321 plot locationsin Amazonia (solid linesin 
Extended Data Fig. 5). For each predictor variable, we calculatedannual 
records of upper and lower confidence intervals by respectively adding 
and subtracting 2o to the mean of each annual value (shaded area in 
Extended Data Fig. 5). Second, for each predictor variable we param- 
eterized alinear model for each continent using the annual records for 
the period of the observed trends. Then for each predictor variable, 
the continent-specific linear regression models were used to estimate 
predictor variables for each plot location from LJanuary 2015 to 31 
December 2039 in Africa and from mid-2011 to 31 December 2039 in 
the Amazon (dotted lines in Extended Data Fig. 5). 


Estimating future carbon gain, loss andsink trends 

We used the minimum adequate models (Table 2) to predict annual 
records of carbon gain, carbon loss and the carbon sink for the plot 
networks in Africaand Amazonia over the period 1983 through to2040 
(Fig. 3). We extracted predicted carbon gain and loss values using the 
mean annual records for each predictor variable (predictSE.lmefunc- 
tion, AlCcmodavg package"). Upper and lower confidence intervals 
were calculated accounting for uncertainties in the model (both fixed 
and random effects) and predictor variables using the 20 upper and 
lower confidenceinterval for each predictor variable (using predictSE. 
Ime). Finally, the net carbon sink was calculated by subtracting the 
losses from the gains. To obtain sink values in the future, reported in 
Table 1, annual per unit area sink predictions (from Fig. 3) were aver- 
aged over each decade and multiplied by the future forest area, as 
described above. 

Totest the sensitivity of the future predictions in Fig. 3, we reran the 
analysis by modifying future trajectories of predictor variables one 
atatime, while keeping all others the same, to assess the mean C sink 
over 2010-15 and 2030 (averaging at 2030 is not necessary as trends 
in MAT-changeand MCWD, which largely drive modelled inter-annual 
variability, are estimated as smooth trendsin the future). For each pre- 
dictor variable, we explored the potential impacts of the likely bounds 
of possibility: (1) by taking the steepest slope of either continent from 
the extrapolated trends, doubling this slope and applyingit on both 
continents; and (2) by taking the steepest slope of either continent 
fromthe extrapolated trends, taking the additive inverse of this slope 
and applyingiton both continents. These bounds represent deviations 
of >20 from observed trends. Change in MAT also alters MAT-change, 
sowe present the sensitivity of both parameters together. 

Additionally, for CO,-change and MAT, we also calculated future 
slopes under three future Representative Concentration Pathway (RCP) 
scenarios’* with different radiative forcingin2100:RCP2.6,RCP4.S and 
RCP8.5. FutureRCP CO,-change slopes (ppmyr") were calculated using 
RCP CO, concentration data for the years between 2015 and 2030 inclu- 
sive. Future RCP MAT and MAT-change slopes were obtained from plot- 
specific MAT values extracted from downscaled -I-km‘resolution data 
for current’ and future* climate from WorldClim, and averaged over 
19 CMIPS models. We subtracted the mean 2040-2060 climate MAT 
(that is, 2050) from the mean 1970-2000 climate MAT (that is, 1985), 
divided by 65 years to give the annual rate of change. We then calculated 
amean slope overall plots per continent. Finally, to avoid mismatches 
between RCP-derived values of CO, and MAT and the observed records, 
we removed any difference in intercept between the RCP trends and 
observed trends, so that the RCP trends were a continuation of the 
end-point of the observed trajectory (31 December 2014). We did not 
estimate the sensitivity of MCWD under the RCP scenarios, because the 
mean of the CMIPS models do not show drought trends for our forest 


plotnetworks, unlike rain gauge data for the recent past", and thus 
would show little or no sensitivity to MCWD. For each modified slope, 
Supplementary Table reports the absolute decline in thesink in each 
continentin 2030 compared tothe 2010-15 mean sink. This shows that 
the future sink strength is sensitive to future environmental conditions, 
butwithin both RCP scenariosand our bounds of possibility weshowa 
decline in the sink strength in both continents over the 2020s. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Source data to generate figures and tables are available from https:// 
doi.org/10.5521/Forestplots.net/2019 1. 


Codeavailability 
Rcode to generate figures and tables is available fror 
org/10.5521/Forestplots.net/2019_1. 
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<S0 km apartare shownas one point for display only, with the circle size 
corresponding tosampling effortin terms of hectares monitored. Land cover 
dataare from TheLand Cover Map for Africain the Year 2000 (GLC2000 
database)". This map was created using the R statistical platform, version 
3.2.1 (ref. ), whichis under the GNU Public License. 


Extended Data Fig.1|Map showing the locations of the 244 plotsincludedin 
this study. Dark green representsall lowland closed-canopy forests, 
submontane forests and forest agriculture mosaics; ight green shows swamp. 
forestsand mangroves, blue circlesrepresent plot clusters, referred toby 
three-letter codes (see Supplementary Table 1 for the fulllist of plots). Clusters 
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Extended Data Fig. 2 |Long-termaboveground carbon dynamics of244 (from tree mortality) (c). Examples of time series for three individual plotsare 
African structurally intactold-growth tropical forest inventory plots. shownin purple, yellow and green, Associated histograms show the 
Points in thescatterplotsindicate the mid-censusinterval date, withhorizontal _ distribution ofthe plot-level net aboveground biomass carbon (witha three- 
bars connectingthe startand end date for each censusinterval for net parameter Weibull probability density distribution fitted in blue, showing that 
aboveground biomass carbon change (a), carbon gains (from woody thecarbon sinkis significantly larger than zero; one-tailed etest:P<0.001) (4), 


production from tree growth and newly recruited stems) (b),andcarbonlosses _ carbongains (e) and carbonlosses (f). 
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Extended Data Fig.3| AIC from correlations between the carbon gainin 
tropical forestinventory plots and changesin atmospheric CO., 
temperature (MAT) or drought (MCWD), each calculated over ever- 

longer prior intervals. Panels show the AIC from linear mixed effects 

‘models of carbon gains from 65 African and Amazonian plots and 
corresponding changes in atmospheric CO, (CO,-change) (a), MAT (MAT- 
change) (b), and drought (MCWD-change) (c). For CO, the AIC minimum was 
observed when predicting the carbon gain fromthe change inCO, calculated 
overa56-year-long prior interval length. We use this length of time to calculate 
‘our CO;-change parameter. Such avalueis expected because foreststands will 
respond most strongly to CO, when most individuals have grown under the new 
rapidly changing condition, which shouldbe atits maximum atatime 
approximately equivalentto the CRT of forest stand”®*° (mean of 62yearsin 


prior interval length (yr) 


this pooled African and Amazonian dataset). For MAT the AIC minimum was 
Syears, which weuse as the prior interval to calculate our MAT-change 
parameter. This lengthis consistent with experimentsshowing temperature 
acclimation of leaf- and plantlevel photosynthetic and respiration processes 
over approximately half-decadal timescales” For MCWD the AIC minimum is 
not obvious, while the slope of the correlation, shownin d, shows no overall 
trend and oscillates between positive or negative values, meaning there isno 
relationship between carbon gains and the changein MCWD over intervals 
longer than year; therefore MCWD-changeis not included in our models. This 
resultsuggests that once adrought ends, itsimpact on tree growth fades, 
rapidly, as seenin other studies"”*. Furthermore, in the moist tropics wet- 
season rainfallis expected to recharge soil water, and hence lagged impacts of 
droughtsarenot expected. 
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Extended DataFig. 4 Potential forest dynamics-related drivers ofcarbon 
gains and lossesin structurally intact old-growth Africanand Amazonian 
tropical forestinventory plots. Theaboveground carbon gains, fromwoody 
production (a, b), and aboveground carbon losses, from tree mortality (c,d), 
are plotted against the CRT, and wood density for African (blue)and 
Amazonian (brown) inventory plots. Linear mixed effects models were 
performed with census intervals (n=1,566) nested within plots (n=565) to 
avoid pseudo-replication, usingan empirically derived weighting based on 
intervallength and plotarea (see Methods). Significantregression lines from 


0.60 0.70 0.80 
Wood density (g cm™) 


the linear mixed effects models for the complete dataset are shownasasolid 
Jine;non-significantregressionsare shownasa dashed line. Each dot 
representsa time-weighted mean plot level value; the shadingof the dot 
represents total monitoringlength, with empty circles corresponding to plots 
monitored for <5 years and solid circles for plots monitored for >20 years. 
Carbon loss dataare presented untransformed for comparison with carbon 
gains; inear mixed effects models ontransformed data to fit normality 
assumptions donot change the significance of the results. Note that CRT is 
calculated differently for the carbon gains and losses models (see Methods). 
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Extended Data Fig. 5| Trendsin predictor variables used toestimatelong- 
term trendsinaboveground carbon gains, carbonlossesand the resulting 
net carbonsink in African and Amazonianstructurally intact old- 

growth tropical forest inventory plotnetworks. Mean annual CO.-change (a), 
MAT (b), MAT-change (), MCWD (d), CRT (e) and wood density (f) for African 
plotlocations in blue, and corresponding variables for Amazon plot locations 
inbrown (g-1).Solid lines represent observational data where>75% of the plots 
were monitored; long-dashed linesare plotmeans where <75% of plots were 
‘monitored. Dottedlinesare future values estimated from linear trends from 


the LJanuary 1983 to 31 December 2014 (Africa) or LJanuary 1983 to mid-2011 
(Amazon) data (slope and Pvalue reported in each panel), see Methods for 
details. Upper and lower confidence intervals (shaded area) for the pastare 
calculated by respectivelyaddingand subtracting 2oto the mean of each 
annual value. Upperand lower confidence intervals forthe future (Africa: 1 
January 2015 to 31 December 2039; Amazonia: mid-2011 to31 December 2039) 
were estimated by addingand subtracting 2afromthe slope of the regression 
model. 
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Extended Data Fig. 6 | The change in carbon losses versus CRT of long-term 
structurally intact old-growth forestinventory plotsin Africaand 
‘Amazonia. For plots with two censusintervals, we calculated the changein 
carbon losses (‘Alosses’)asthe carbon losses (inMgCha"'yr") of the second, 
interval minus the carbon losses of the irst interval, divided by the difference 
inmid-interval dates. For plots with more than two intervals, we calculated the 
changein carbon losses for each pair of subsequent intervals, then calculated 
the plot-level mean over all pairs, weighted by the time length between mid- 
interval dates. Thisanalysis includes only plots withat least two census 
intervals that were monitored fora total of 220 years (thatis, roughly one-third 
of the mean CRT of the pooled African and Amazon dataset; n=116).Breakpoint 
regression was used to assess the CRT length below which forest carbon losses 
beginto increase. Plotswith CRT <77 yearsshowa recent long-termincreasein 
carbon losses; longer CRT plots donot. Blue pointsare African plots, brown 
pointsare Amazonian plots. 
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Extended Data Fig. 7| Trendsinnet aboveground livebiomass carbon, 
carbon gains and carbon losses from intensively monitored structurally 
intact old-growth tropical forestinventory plotsin Africa. Trendsare 
calculated for thelast 15 years of the twentieth century (a-c) and the first 15 
yearsof the twenty-first century (d-f). Plots were selected fromthe full dataset 
ifftheir census intervals cover atleast 50% of the respective time windows, that 
is, they are intensely monitored (n=56 plots for [January 1985 to 31 December 
1999, and n= 134 plots for IJanuary 2000 to 31 December 2014, respectively). 


year 


Solid lines showmean values, and shading corresponds to the 95% Cl. as 
calculated inFig. 1. Dashed lines, slopes and Pvaluesare from linear mixed 
effectsmodels, asin Fig.1. The data shows difference compared toFig.1, 
notably the sink decline after about 2010 driven by rising carbon losses. Thisis 
because in Fig. 1we include all available plots over the January 1983 to 

31 December 2014 window, which includes clusters of plots monitored only in 
the 2010s, often monitored forasingle census interval, thathad low carbon 
lossand high carbonsink values. 
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Extended Data Fig. 8 | Twenty-first-centurytrendsinabovegroundbiomass _splitintoalong-CRT group (c)andashort CRT group (4), each containing half 
carbon losses fromstructurally intact old-growth African tropical forest ofthe 134 plots. Solid lines indicate mean values, shading the 95% Cl, as for 
inventory plots with either long or short CRT.a, b, All plots, thatis,asinFig.1, __ Fig.1. Dashed lines, slopes and Pvaluesare from linear mixed effects models, as. 
butsplitintoalong-CRT group (a)and ashort-CRTgroup (b),eachcontaining _for Fig. 1. Carbon losses increase at a higher rate in the short-CRT than thelong- 
half of the 244 plots.¢,d, Plotsare restricted to those spanning >50% ofthe CRT group of plots, in both datasets, although thisincreaseisnotstatistically 


time window, thatis, intensely monitored plots, asin Extended DataFig.7,but __ significant. 


Extended Data Table 1| Models to predict carbon gains and losses in structurally intact old-growth African and Amazonian 


tropical forests 


Carbon gains (Mg C ha" yr) 


Predictor variable Parameter value Standard error t value P value 
Intercept 4.694 0.739 6.354 <0.001 
COz (ppm) 0.005 0.001 3.196 0.001 
MAT (°C) -0.143 0.021 -6.844 <0.001 
MCWD (mm x1,000) -1.232 0.210 -5.878 <0.001 
Carbon losses (Mg C ha! yr) 

Predictor variable Parameter value Standard error t value P value 
Intercept 0.926 1.854 0.499 0.617 
CO; (ppm) 0.004 0.004 0.947 0.344 
MAT (°C) -0.011 0.044 0.249 0.804 
MCWD (mm x1,000) 0.498 0.505 0.985, 0.325 


Models to predict carbon gains and losses in structurally intact old-growth African and Amazonian tropical forests, ineluding only environmental variables that show longsterm ends that 


affect theory-driven models of photosynthesis and respiration. Carbon loss values were normalized via power-law transformation, A= 0.361 
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Extended Data Table 2 | Forest area estimates used to calculate total continental forest sink 


Period intact forest area (Mha) 
Africa Amazon Southeast Asia Pan-tropics 

1980 671.5 958.3 233.6 1863.4 
1985 634.3 921.1 207.4 1762.8 
1990 600.2 885.2 190.6 1676.0 
1995 565.9 851.1 163.5 1580.5 
2000 531.8 817.2 136.9 1485.9 
2005 504.8 784.5 129.2 1418.5 
2010 477.8 756.3 118.4 1352.5 
2015 450.5 726.7 101.5 1278.7 
2020 425.5 698.5 90.1 1214.2 
2025 402.0 671.5 80.0 1153.4 
2030 379.7 645.4 71.0 1096.1 
2035 358.6 620.4 63.0 1042.1 
2040 338.8 596.4 56.0 991.1 


Intact forest area for 1990, 2000, 2008 and 2010 is published in ret." (thats, the total forest area minus forest regrowth). To estimate ntact forest area forthe other years in this table, we fitted 
‘exponential models for each continent using the published data 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only comman tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) ar other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g, confidence intervals) 


For null hypothesis testing, the test statistic (e.g, F,t,r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markav chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


(Our web collection on statistics for bialagists contains articles an many of the paints above, 


Software and code 


Policy information about availability of computer cade 
Data collection No software was used for data collection. 
Data analysis All calculations were performed using the R statistical platform, version 3.2.1 (R Development Core Team, 2015) using the BiomasaFP R 


package v0.2.1 (Lopez-Gonzalez, Sullivan, & Baker, 2017). Source data and R-code to generate figures and tables are available from 
https://figshare.com/s/60f48673202283421f43. 
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~ Alist of figures that have associated raw data 
-A description of any restrictions on data availability 


Source data and R-code to generate figures and tables are available from: https://figshare.corn/s/60f48673202283421 43. This data and code package allows 


reproducing the main Figures and Table 2. All permanent inventory plot data is bound to data-use restrictions defined on Forestplots.net. To avoid unauthorized use 
of data, the input files on figshare provide only the information that is necessary to reproduce the figures and the tables using the r-scripts 
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Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description 


Research sample 


We reconstruct the evolution of the per unit area African tropical forest carbon sink (in Mg C ha-1 yr-1) over three decades to 2015 
(Figure 1), To do so, we collected, compiled and analysed data from 244 repeatedly measured permanent forest inventory plots in 11 
African countries. Selected plots are situated in structurally intact old-growth forests and are part of the African Tropical Rainforest 
Observation Network (AfriTRON; www.afritron.org; 217 plots) and other sources (27 plots). Plot monitoring periods span 2 to 40, 
years, between 1968 to 2015 (Extended Data Figure 1). In each plot (mean size, 1.1 ha), all trees >100 mm in stem diameter were 
identified, mapped and measured on at least two occasions using standardized methods (135,625 trees monitored) and live biomass 
carbon stocks were estimated for each census date, with carbon gains and losses calculated for each interval (Extended Data Figure 
2). We compared trends in the per unit area African tropical forest carbon sink with published long-term trends in the Amazonian 
carbon sink (Brienen, et al. 2015). We pooled the new African and existing Amazonian plot inventory data together to investigate the 
putative environmental drivers of changes in the tropical forest carbon sink, and project its likely future evolution. 


‘Aboveground Carbon (AGC, in Mg C ha-1) in living biomass for each plot at each census date was estimated as the sum of the AGC of 
each living stem, then divided by plot area (in hectares). 


Carbon Gain is the sum of the aboveground live biomass carbon additions from the growth of surviving stems and the addition of 
newly recruited stems, using standard methods (Brienen, et al. 2015). For each stem that survived a census interval, carbon additions 
from its growth (Mg C ha-1 yr-1) were calculated as the difference between its AGC at the end census of the interval and its AGC at 
the beginning census of the interval. For each stem that recruited during the census interval (i.e. reaching DBH2100 mm), carbon. 
additions were calculated in the same way, assuming DBH=0 mm at the start of the interval (Talbot, et al. 2014).The carbon additions 
inan interval, from surviving and newly recruited stems, were summed, then divided by the census interval length (in years), and 
scaled by plot area (in hectares) (Talbot, et al. 2014). As carbon gains are affected by a census interval bias, with the underestimate 
increasing with census length, we corrected this bias by accounting for (i) the carbon additions from trees that recruited and then 
died within the same interval (unobserved recruitment), and (ii) the carbon additions from trees that grew before they died within an 
interval (unobserved growth) (Talbot, et al. 2014). These typically add <3% to plot-level carbon gains. 


Carbon Loss (in Mg C ha-1 yr-1) is estimated, using standard methods (Brienen, et al, 2015), as the sum of aboveground biomass 
carbon from all stems that died during a census interval, divided by the census length (in years) and scaled by plot area (in hectares). 
Carbon loss is also affected by the same census interval bias, hence we corrected this bias by accounting for () the additional carbon 
losses from the trees that were recruited and then died within the same interval, and (ii) the additional carbon losses resulting from 
the growth of the trees that died in the interval (Kohyama, et al. 2018; Talbot, et al. 2014). Calculation details of both components 
are explained in Supplementary Methods. 


Net Carbon Sink (in Mg C ha-1 yr-1) is estimated as carbon gains minus carbon losses. 


The estimated mean carbon gains, carbon losses and the net carbon sink of the African plots from 1983-2014, the solid lines in Figure 
1, were calculated following (Brienen, et al. 2015) to allow direct comparison with published Amazonian results. First, each census 
interval value was interpolated for each 0.1-yr period within the census interval. Then, for each 0.1-yr period between 1983 and 
2014, we calculate a weighted mean of all plots monitored at that time, using the square root of plot area as a weighting factor. 
Finally, confidence intervals for each 0.1-yr period are bootstrapped. 
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We use data from 244 plots in 11 African countries to present the first assessment of the temporal evolution of the tropical forest 
carbon sink in Africa, It represents 10 years of new field campaigns in Africa, extending sampling into extremely remote and 
previously unsampled regions. This is the first new manuscript using long-term inventory plots to estimate the intact forest carbon 
sink in Africa since (Lewis, et al. 2009) was published in Nature, 

Plot selection: 244 permanent inventory plots were selected from 11 countries. These plots are situated in closed canopy (i.e. not 


Sampling strategy 


Data collection 


‘woody savanna) old-growth mixed-age forests and were selected using commonly used criteria (Brienen, et al. 2015; Lewis, et al. 
2009; Lewis, et al. 2013): free of fire and industrial logging: all trees with diameter at reference height 2100 mm measured at least, 
‘twice; 20.2 ha area; <1500 m.a.s.l altitude; MAT 220.0°C (Hijmans, et al. 2005); annual precipitation 21000 mm; located 250 m from 
anthrapogenic forest edges. 
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No sample size calculation was performed. We selected all available plots meeting the criteria described above, All African tropical 
forest regions (West Africa, Lower Guinea, Congo Basin, East Africa) are adequately represented. This is the largest dataset of 
repeatedly measured plots ever used to calculate long-term trends in African forest carbon dynamics. 


Plot inventory data was collected by teams led by at least ane of the 104 researchers co-authoring this paper. All permanent 
inventory plots are part of one or several networks. Of the 244 plots included in the study, 217 contribute to the African Tropical 
Rainforest Observatory Network (AfriTRON; www.afritron.org), with data curated at www.ForestPlots.net. These include plots from 
Sierra Leone, Liberia, Ghana, Nigeria, Cameroon, Gabon, Republic of Congo, Democratic Republic of Congo (DRC), Uganda and 
Tanzania (Lopez-Gonzalez, et al. 2011; Lopez-Gonzalez, et al. 2009) (Extended Data Figure 1). Fifteen plots are part of the TEAM 
network, from Cameroon, Republic of Congo, Tanzania, and Uganda (Hockemba 2010; Kenfack 2011; Rovero, et al. 2009; Sheil and 
Bitariho 2009). Nine plots contribute to the ForestGEO network, from Cameroon and DRC (Anderson-Telxeira, et al. 2015) (9 plots 
from DRC, codes SNG, contribute to both AfriTRON and ForestGEO networks, included above in the AfriTRON total). Finally, three 
plots from Central African Republic are part of the CIRAD network (Claeys, et al. 2019; Gourlet-Fleury, et al. 2013). 


Tree-level aboveground biomass carbon is estimated using an allometric equation (Chave, et al. 2014) with parameters for tree 
diameter, tree height and wood mass density. The estimated aboveground biomass of a plot is the sum of the estimated biomass of 
all ive trees at that census date 


Tree Diameter: In all plots, all woody stems with 2100 mm diameter at 1.3 m from the base of the stem (‘diameter at breast height’, 
BH), or 0.5 m above deformities or buttresses, were measured, mapped and identified using standard forest inventory methods 
(Phillips, et al, 2016). The height of the point of measurement (POM) was marked on the trees and recorded, so that the same POM 
is used at the subsequent forest census. For stems developing deformities or buttresses over time that could potentially disturb the 
initial POM, the POM was raised approximately S00 mm above the deformity. Estimates of the diameter growth of trees with 
changed POM used the ratio of new and old POMSs, to create a single trajectory of growth from the series of diameters at two POM 
heights (Brienen, et al, 2015; Lewis, et al. 2009; Talbot, et al. 2014). We used standardized protocols to assess typographical errors 
and potentially erroneous diameter values (e.g. trees shrinking by >5 mm), missing values, failures to find the original POM, and 
other issues. Where necessary we estimated the likely value via interpolation or extrapolation from other measurements of that tree, 
or when this was not possible we used the median growth rate of trees in the same plot, census and size-class, defined as DBH 
100-199 mm, or 200-399 mm, or >400 mm (Talbot, et al. 2014). We interpolate measurements for 1.3% of diameters, extrapolate 
0.9%, and use median growth rates for 1.5%. 


Tree height: Height of individuels from ground to the top leaf, hereafter Ht, was measured in 204 plots, using a laser hypsometer 
(Nikon forestry Pro) from directly below the crown (most plots), a laser or ultrasonic distance device with an electronic tlt sensor, a 
manual clinometer, or by direct measurement, ie. tree climbing. Only trees where the top was visible were selected (Sullivan, etal 
2018). In most plots, tree selection was similar: the 10 largest trees were measured, together with 10 randomly selected trees per 
diameter from five classes: 100-199 mm, 200-299 mm, 300-399 mm, 400-499 mm, and 500+ mm trees, following standard protocols 
(Sullivan, et al. 2018). We use these data and the local heights function in R package BiomasaFP (Lopez-Gonzalez, et al. 2017) to fit 3- 
parameter Weibull relationships (see Supplementary Methods for 2 full explanation of this procedure): 

H_t=a x{1-e%((-b x(DBH/10}*C )}) (equation 1) 

We chose the Weibull model as it is known to be robust when a large number of measurements are available (Feldpausch, et al 
2012; Sullivan, et al. 2018). We parameterize this Ht-DBH relationship for four different combinations of edaphic forest type and 
biogeographical region (parameters in parentheses): i) terra firme forest in West Africa (a=56 0; b=0.0401; c=0.744); (i) terra firme 
forest in Lower Guinea and Western Congo Basin (a=47.6; b=0.0536; c=0.755); (ii) terra firme forest in Eastern Congo Basin and East 
Arica (a=50.8; b=0.0499; c=0.706); and finally (iv) seasonally flooded forest from Lower Guinea and Western Congo Basin (a=38.2; 
b=0.0605; c=0.760). The parameters were used to estimate Ht from DBH for all tree DBH measurements for input into the allometric 
equation, 


Wood Density: Dry wood density (a) measurements were compiled for 730 African species from published sources and stored in 
wwwForestPlots.net; most were sourced from the Glabal Wood Density Database on the Dryad digital repository 
(www.datadryad.org)(Chave, et al. 2009; Zanne, et al. 2009). Each individual in the tree inventory database was matched to a 
species-specific mean wood density value. Species in both the tree inventory and wood density databases were standardized for 
orthography and synonymy using the African Flowering Plants Database (www ville-ge.ch/cjb/bd/africa/) to maximize matches 
(Lewis, et al. 2009). For incompletely identified individuals or for individuals belonging to species not in the p database, we used the 
mean p value for the next higher known taxonomic category (genus ar farrily, as appropriate). For unidentified individuals, we used 
the mean wood density value of all individual trees in the plot (Lewis, et al. 2009; Lopez-Gonzalez, et al. 2011) 


Allometric equation: For each tree we use a published allometric equation (Chave, et al. 2014) to estimate aboveground biomass. We 


Timing and spatial scale 


Data exclusions 


Reproducibility 


Randomization 


then convert this to carbon, assuming that aboveground carbon (AGC) is 45.6% of aboveground biomass (Martin, et al. 2018). Thus 
AGC=0.456x (((0.0673x(p x(DBH/10)*2 xH_t )*0.976)}/(1000)) (equation 2), with DBH in mm, dry wood density, p, in g cm-3, and 
total tree height, Ht, in m (Chave, et al. 2014), 
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The large majority of plots are sited in terra firme forests and have mixed species composition, although four are in seasonally 
flooded forest and 14 plots are in Gilbertiodendran dewevrei monadominant forest, a locally common forest type in Africa 
(Supplementary Table 1). The 244 plots have a mean size of 1.1 ha (median, 1 ha), with a total plot area of 27.9 ha, The dataset 
comprises 391,968 diameter measurements on 135,625 stems, of which 89.9% were identified to species, 97.5% to genus and 97.8% 
to family. 


Plots were measured at least twice and maximum 10 times, between 1968 and 2015. Plot monitoring periods span 2 to 40 years; 
mean total monitoring period is 11.8 years, mean census length 5.7 years, with a total of 3,214 ha years of monitoring, The 321 
‘Amazon plots are published and were selected using the same criteria (ref.6), (Brienen, et al. 2015)except in the African selection 
criteria we specified a minimum anthropogenic edge distance and added a minimum temperature threshold, 


Brienen, R.J. W.,etal 
2015 Long-term decline of the Amazon carbon sink. Nature $19(7543):344-348, 


Plots were selected using the criteria described above (section Research sample). Plats that did not meet one or several of these 
criteria were not used for analysis. 


Gur analysis does not include experimental findings. 


Trends in carbon gains, lasses and the net carbon sink over time were assessed using linear mixed effects models (Imer function in R, 


Randomization 


Blinding 


Did the study involve field 


Inmed package (Bates, et al. 2013), providing the linear slopes reported in Figure 1. These models regress the mid-point of each 
census interval against the value of the response variable for that census interval. Plot identity was included as a random effect, Le. 
assuming that the intercept can vary randomly among plots. Observations were weighted by plot size and census interval length 
Weightings were derived empirically, by assuming a priori that there is no significant relation between the net carbon sink and census 
interval length or plot size (Lewis, et al. 2006). 
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Blinding was not relevant to our study. 


work? Yes No 


Field work, collection and transport 


Field conditions 


Location 


Access and import/export 


Disturbance 


All plots are located in African tropical forests receiving at least 1000 mm rainfall annually and with a mean annual temperature 
of at least 20°C 


Plots are located at low elevations (<1S00 m.a.s. altitude). A map showing locations of all plots is presented in Extended Data 
Figure 1 


This paper is a product of the African Tropical Rainforest Observatory Network (AfriTRON), the TEAM network, the ForestGEO 
network, and the CIRAD network. These permanent inventory plot networks only exists thanks to the support of governments, 
local administrations and villages across Africa who have given us permission for, and helped us complete, our fieldwork. A full 
list of partner institutions (excluding those in the co-author affiliations) can be found in (on-line only) acknowledgements. 
Furthermore, plat inventory data are the product of many field-teams which mainly consisted of local assistants. A full list of 
people involved in data collection can be found in (on-line only) acknowledgements, along with a full list of villages and 
communities that hosted the field-tearns and provided logistical and infrastructural support. 


This paper includes 264 plot-censuses (out of 746) that are published for the first time here, including censuses from plots 
located in extremely remote areas such as the Salonga National Park in the heart of the Congo Basin. Each plot-census 
represents several months of preparation, transport, data collection, digitalisation and data quality assessment. 


No significant disturbance was caused by our measurements. Trees were tagged using a single aluminum nail (no iron), avoiding 
damage to trees due to corrosion. 
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Most magmatism occurring on Earth is conventionally attributed to passive mantle 
upwelling at mid-ocean ridges, to slab devolatilization at subduction zones, or to 
mantle plumes. However, the widespread Cenozoic intraplate volcanism in northeast 
China" and the young petit-spot volcanoes*” offshore of the Japan Trench cannot 
readily be associated with any of these mechanisms. In addition, the mantle beneath 
these types of volcanismis characterized by zones of anomalously low seismic 
velocity above and below the transition zone* ” (amantle level located at depths 
between 410 and 660 kilometres). Acomprehensive interpretation of these 
phenomenais lacking. Here we show that most (or possibly all) of the intraplate and 
petit-spot volcanism and low-velocity zones around the Japanese subduction zone 
can be explained by the Cenozoic interaction of the subducting Pacific slab witha 
hydrous mantle transition zone. Numerical modelling indicates that 0.2 to 0.3 weight 
percent of water dissolved in mantle minerals thatare driven out from the transition 
zone in response to subduction and retreat ofa tectonic plate is sufficient to 
reproduce the observations. This suggests that a critical amount of water may have 
accumulated in the transition zone around this subduction zone, as well as in others of 
the Tethyan tectonic belt" that are characterized by intraplate or petit-spot volcanism 


and low-velocity zones in the underlying mantle. 


The Cenozoic intraplate volcanism in northeast China is located more 
than 1,000 km westward ofthe Japan Trench’, while the youngalkaline 
basalts (0-6 Ma) known as petit-spots outcrop up to600km eastwardof 
thetrench* (Fig. 1). The formation mechanism of these types of onshore 
and offshore volcanism is still debated, as there isno geological and 
geophysical correlation with mantle plumes or arc volcanism*"*, 
Seismic tomography models indicate that in this region the Pacific 
Plateis currently stagnantin the mantle transition zone (MTZ), extend- 
ing continuously up to nearly 1,000 km to the inland of northeast 
China’, Thus, ithas been proposed that the Cenozoic intraplate mag- 
matismis related to the dehydration of the Pacific slab in the MTZ". 
The primary petit-spot magma has been determined to be volatile- 
rich with extremely enriched mantle (EML-ike) isotopic compo: 
tions*’. Thelack ofhotspot tracks n this region excludesa contribution 
fromamantle plume. Ithasbeen postulated thatthe petit-spot magma 
formsin the asthenosphereand migrates upward through the oceanic 
lithosphere by reactive porous flow in response to plate flexure*”. 
Based on electrical conductivity surveys, the MTZ probably holds 
about 0.1 wt% water, The MTZ below northeast China and Japan is 
particularly wet, with at least 0.5-1 wt% water”. The MTZis primarily 
composed of wadsleyite and ringwoodite minerals that can accom- 
modate 1-3 wt water, which is 1to2 orders ofmagnitude higher than 
the water (hydrogen) solubility in upper- and lower-mantle minerals. 
Given the large contrastin water solubility between the MTZandupper/ 
lower mantle, itis reasonable to expect deep dehydration meltingwhen 
subducting slabs excite vertical flow in the nearby wet MTZ". Indeed, 
seismic low-velocity zones (LVZs) above 410 kmand below 660 km have 


been observed not only in Japan* "5, but also around subduction 
zonesin Europe” and the western United States”, 

To test this hypothesis, we construct two-dimensional numerical 
experimentsin whicha self-sustained oceanic plate subduction is char- 
acterized by trench retreatand slab stagnation intoa homogeneously 
or heterogeneously wet MTZ (see Methods). The subducting plate and 
entrained dry upper mantle push the adjacent wet MTZ downward 
to the lower mantle such that a partially molten layer forms between 
700 kmand 800 km depth (Fig. 2a, region labelled M2) (Supplementary 
Video). Onthe other hand, MTZ material uplifted to the uppermantle 
starts to partially melt above 410 km (Fig. 2a, M3). Slab stagnation and 
retreat is accompanied by sub-slab MTZ upwelling and new melting 
(M1). These partially molten regions above and below the MTZ cause 
large seismic LVZs (Fig. 2c). When melt percolationis active (see Meth- 
ods), extraction to the surface occurs, formingintraplate and petit-spot 
volcanisms ahead of and behind the trench, respectively. 

Figure3 shows the spatial and temporal trend of modelled volcanics 
for the reference model in Fig. 2. The firstintraplate volcanismoccurs 
about $00 km away from the trench, then spreads in two opposite 
directions. Themantle water content decreases after meltextraction, 
which precludes further deep (2200 km) melting of the residual peri- 
dotite (Fig. 2b). As the slab rolls back, more distal wet MTZis sucked 
intothe upper mantle wedge, such that partial melting and volcanoes 
will form further away from the trench. The new generated volcan- 
ism is not homogeneously distributed as it is strongly influenced by 
mantle flow and trench movement. Furthermore, a heterogeneous 
distribution of water in the MTZ would prevent the formation of any 
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Fig.1|Geological/geophysical maps and Cenozoic volcanic fieldsin 
northeast China and offshore Japan. a, The red triangles denote volcanoes; 
the black and magenta opencirclesshow seismiclow velocity at410 kmand 
below 660 km, respectively, as determined by receiver functions". The 
yellowsquaresindicate three young alkaline basalt sites (A, B, C), knownas 
petit spot, offshore theJapan Trench. The black contour linesindicate the 


temporal-spatial magmatic sequence as wetter portions would melt 
earlier than drier regions at the same pressure and temperature (P-T) 
conditions. Itis noteworthy that intraplate volcanism also occurs a 
few hundred kilometres in front of the slab tip. After about 12 Myr of 
modelled subduction, petit-spot volcanoes appear behind the trench. 
They are located up to about 300 km seaward of the trenchand exhibit 
asimilar magmatic activity trend to the intraplate volcanism. 

We further test the influence of initial water contentin the transition 
zoneand other parameters on the genesis of asthenospheric melting 
(Extended Data Fig. 5). Melting commences 40 km above the transition 
zone, and no petit-spot volcanoes are formed for 0.2 wtinitial water. 
The thickness of the partially molten layer could range froma few tens 
of kilometres to morethan 100 km, depending on the melt extraction 
efficiency and water content. A petit-spot volcano might be located 
more than 600 km from the trench if the melt extraction process is 
efficient (Extended Data Fig. Sb). Given the assumed homogeneous 
distribution of water in the MTZ, these models provide upper-bound 
estimates on the volumes of volcanics and melt. However, the results 
also hold fora more realistic heterogeneous distribution of the water 
inthis mantle level (Extended Data Fig. Se). 

When comparing the model results with seismic and geological 
observations, we note that around the Pacific slab, three remarkable 
seismic LVZs outside the transition zone are clearly imaged (Fig. 1b). 
Theseare well correlated with the locations of intraplate and petit-spot 
volcanoes, and the modelled partially molten zones (Figs. 1a and 2). 
Although seismic low-velocity anomalies are generally attributed to 
thermal effects, or to the presence of water, melt"™” and/or major 
elementcompositional heterogeneities", ithasrecently been argued 


Pacific Plate depths in the mantle. The present-day Pacific Plate front lies 
between the Tanlu faultzone (TFZ) and the north-south gravity lineament 
(NSGL).Map created with open software GMTS.4.3., Cross-section (thick 
greyline ina) of seismic P-wave velocity perturbation with three distinct low: 
velocity zones. 


that some of these LVZs could be artefacts induced by seismic ani- 
sotropy"*. Nevertheless, the authenticity of the sub-slabs LVZ1 and 
LVZ2 appearing in tomographic models has been confirmed by other 
independent studies using, respectively, an accurate scrutiny of the 
seismic ray paths” sampling the LVZ1, and receiver functions in the 
case of LVZ2". The LVZ3 sits below the active Changbai volcano and 
appears to extend downto 410 kmas revealed by multiple high-resolu- 
tiontomography models*"*, Athermalanomaly fromanon-hotspot 
upwelling, ifit hypothetically exists, is difficult to reconcile with the 
large velocity drop of LVZ1. The hot material will rapidly cool when 
flowing upward, because of adiabatic decompression and the latent 
heat of the wadsleyite-to-olivine reaction. Laboratory experiments 
show that seismic wave speeds are insensitive to moderate (<1 wt) 
water contents for olivine” and wadsleyite”; thus, the LVZs are very 
likely to be caused by partial melting and/or compositional hetero- 
geneities. The presence of basalts at the bottom of the upper mantle 
canbe excluded as it would generate a positive seismicanomaly”*. On 
the other hand, basalts accumulating at the base of the MTZ could 
be effectively dragged by the slab into the uppermost lower mantle 
and generate the LVZ2. However, receiver functions indicate that the 
lower-mantle LVZs are within the 750-780 km depth range”, which is 
likely to be below the post-garnet phase transition where basalts are 
seismically faster than mantle rocks. 

‘The presence of meltin the deep mantle, whichis mostly catalysed by 
the involvementof volatiles, decreases seismic velocitiesand provides 
amagmatic source for intraplate/petit-spot volcanism. Our numeri- 
cal models thus suggest that a hydrous transition zone with at least 
0.2-0.3 wt% water beneath northeast China and offshore Japan can 
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Fig.2| Dynamics of subduction-induced dehydration meltingaboveand 
below the mantletransitionzone.a, Composition field. A colourkey 
indicating different rock typesis given at the bottom. Two horizontal black 
ines mark depths of 410 km and 660 km. Three partially molten regions 


comprehensively explain theLVZsand the intraplate and petit-spotvol- 
canism, This model does not exclude the devolatilization of the stagnant 
Pacific slab as a mechanism to explain the LVZ3 region and the overly- 
ing intraplate volcanism’, which favours the upwelling of volatile-rich 
plumes fromthe MTZ” as envisaged by the Big Mantle Wedge model", 
However, the same slab-derived volatiles cannot obviously bethe cause 
ofboth theLVZlandLVZ2and of the petit spots, implyingthepresence of 
ametasomatized MTZ before the last subduction episode. The accumu- 
lation of waterin the MTZ could be caused by, for example, delamination 
of volatile-rich lithosphericroots", orby previousstab dehydration epi- 
sodesin the MTZ and subsequent absorption of the water by wadsleyite 
and ringwoodite. Alongside with water, reduced (by redox-freezing) 
carbonated sources and restitic K-hollandite-bearing sediments are 
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(M1,M2andM3) areindicated. b, Water content with temperature contours, 
¢, Seismic P-wave velocity anomalies. An initial water contentof 0.3 wt%is, 
assumed inthe MTZ, and the reference melt extractiontimescale f,«=6kyF 
(see Methods). 


required to explain the volatile-rich, alkalineand EML-type petrological 
and geochemical signature of the basalts. Thisis not surprising, as 
the MTZ, a graveyard for stagnating stabs, isthe mostlikely candidateto 
hostvolatilesand subducted sediments, and long-termisolation of these 
MTZ domains would be consistent with the ancient metasomatizing 
episodes estimated for intraplate basalts. Subsequent subduction 
events would mobilize the wet and (carbon + alkali)-bearingMTZ rocks, 
promoting the formation of silica-undersaturated magmas in the upper 
mantle. tisimportant to note that the addition of these componentsis 
not critical to our results, because the location and amounts of partial 
meltingaboveand below the MTZ willstill be dictated by the distribution 
of wet MTZ domains, while reduced carbonated sources are expected 
to experience redox melting at shallower depths (<250 km)". 
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Fig.3| Volume of volcanics versus time. The volcanics include arc volcanoes 
produced by shallow decompression/hydrous meltingand intraplate/petit- 
spot volcanoes produced by wet deep upper-mantle melting. Thetrench 
location (black line) through the model evolution and the final location of the 
slab tip (inverted blue triangle) arealsoindicated. 


The process proposed here could potentially explain also the Ceno- 
zoic anorogenic volcanism in the Mediterranean” and intraplate vol- 
canism in the Turkish-Iranian Plateau” regions characterized by the 
long-term subduction of the Tethys Ocean. Together with surface intra- 
plate/petit-spot volcanism, constraints on deep seismiclow velocities 
and/or high electrical conductivity may thus indicate a volatile-rich 
and/or partially molten mantle within and around thetransition zone. 


Onlinecontent 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-020-2045-y. 


1. Chen, ¥,. Zhang, ¥, Graham, D, Su, S.& Deng, J. Geochemistry of Cenozoic basalts and 
‘mantle xenoliths in northeast China. Lithos 96, 108-126 (2007). 

2. Wang, X-C., Wilde, S.A, Li, QL. & Yang, Y-N. Continental flood basalts derived from the 
hhydraus mantle transition zone. Nat. Commun. 6, 7700 (2015). 

3. Chen, C. etal. Mantle transition zone, stagnant slab and intraplate volcanism in northeast 
‘Asia, Geophys. J int.209, 68-85 (2017. 

4, Hirano, N. etal. Volcanism in response to plate flexure. Science 813, 1426-1428 (2006), 

5, Okumura, 8. & Hirano, N. Carbon dioxide emission to Earths surface by deep-sea 
voleanism. Geology at, 167-1170 (2013), 


6 Machida S. etal, Petit-spot geology reveals melts in upper-most asthenosphere dragged 


by lithosphere, Earth Planet. Si. Lett. 426, 267-279 (2015) 
Pilet, S.et al. Pre-subduetion metasomatic enrichment af the oceanic lithosphere 
Induced by plate flexure, Nat. Geosci. 8, 898-903 (2016). 


10. 


0. 


23, 


24, 


UC. Van der Hist, RD, Meltzer, A.S.& Engdahl, E.R. Subduction ofthe indian 
lithosphere beneath the Tibetan Plateau and Burma, Earth Planet. Sci. Lott. 274, 157-168 
(2008). 

Tauzin, 8, Debayle, E.& Witlinger, G. Seismic evidence fora global low- velocity layer 
within the Earths upper mantia. Nat. Geosci. 9, 718-721 (2010). 

Fukao, Y. & Obayashi, M. Subducted slabs stagnant above, penetrating through, and 
trapped below the 680 km discontinuity. Geophys. Res. Solid Earth 118, 5920-5938 
(2012) 

Liu,Z, Park. & Karato, i Seismological detection of low velocity anomalies 
surrounding the mantle transition zone in Japan subduction zone. Geophys Res. Lett. 43, 
2480-2487 (2016). 

Wei, SS. & Shearer, M.A sporadic low-velocity layer atop the 410 km discontinuity 
beneath the Pacific Ocean. 1. Geophys. Res. Solid Earth 122, 5144-5159 (2017). 

Lustrio, M. & Wilson, M. The circum-Mediterranean anorogenic Cenazoic igneous 
province. Earth Sci. Rev. 1, 1-65 (2007) 

Tang, ¥ etal. Changbaishan volcanism in northeast China linked to subduction induced 
‘mantle upwelling. Nat. Geosci.7, 470-475 (2014), 

Zhao, ,, Tan, ¥ Lei, J, Liu, L & Zheng, 8, Seismic image and origin of the Changbai 
Intraplate volcano in East Asia role of big mantle wedge above the stagnant Pacific sab, 
Phys. Earth Planet. Inte. 173, 197-206 (20039). 

Karato, Si Water distribution across the mantle transition zone andits implications for 
global material ciculation. Earth Planet. Sci. Lett. 301, 413-423 (201) 

Kelbert A. Schultz, A. & Egbert, G. Global electromagnetic induction constraints on 
transition-zone water content variations. Nature 460, 1003-1006 (2008). 

Bercovil, D.§& Karato, Si. Whole-mantle convection and the transition-2one water filter. 
Nature 425, 39-44 (2003), 

Liu, Z, Patk, J. & Karato, Si Seismic evidence for water transport out of the mantle 
transition zone beneath the European Alps. Earth Planet. Sci. Lett. 482, 93-104 (2018). 
‘Schmandt. 8, Jacobsen, S. D., Becker, T.W, Liu, Z. & Dueker, KG. Dehydration melting at 
the top of the lower mantle, Science 344, 1265-1268 (2014), 

Hier-Majumdler, S. & Tauzin,B. Pervasive upper mantle melting beneath the western US. 
Earth Planet. Sci. Lett. 463, 25-35 (2017). 

‘Mao, . etal. Elasticity of hydrous wadsleyite to 12 GPa: implications for Earth’ transition 
zone. Geophys. Res. Lett. 35, https/{doi.org/101029/2008GLO3S6I8 (2008). 

Infune,T. etal. Sound velocities of majorite garnet and the composition of the mantle 
transition region. Nature 481, 814-817 (2008), 

Bezada, M., Faccenda, M. & Toomey, D. Representing anisotropic subduction zones with 
Isotropic velocity models: a characterization of the problem and same steps ona 
possible path forward. Geochem. Geophys. Geosyst. 17, 3164-3189 (2016). 

(Obayashi, M., Sugioka,H, Yoshimitsu, J. & Fukao, ¥. High temperature anomalies 
‘oceanwvard of subducting slabs atthe 410-km discontinuity. Earth Planet. So, Lett. 243, 
149-158 (2006). 

hao, 0. & Tian, Y. Changbaiintraplate volcanism and deep earthquakes in East Asia-a 
ppossible link? Geophys. 1. Int.195, 706-724 (2013). 

Cine, C. 1, Faul, U.H, David, €.C., Berry, A.J & Jackson, |. Redox influenced seismic 
properties of upper-mantle olivine. Nature 885, 255-258 (2018). 

Xu, W, Lithgow-Bertellon,C, Stixrude, L & Ritsema, J The effect of bulk composition 
and temperature on mantle seismic structure. Earth Planet. Sel, Lett. 278, 70-79 (2008). 
Litasov, KD, Shatsky, A, Ohtani, E.& Yaxley, G.M, Solidus of alkaline carbonatite inthe 
{deep mantle. Geology 41, 79-82 (2013). 

Kuritan,T.et al Buoyant hydrous mantle plume from the mantle transition zone. Sci. Rep. 
9, 6549 (2019), 

Greon, H.W, I Chon, WP. & Brudzinskl, MR. Selsmic evidence of negligible water 
carried below 400-km depth in subducting lithosphere. Nature 467, 828-831 (2010), 
‘Mazza, SE. etal. Sampling the voatilesich transition zane beneath Bermuda, Nature 
569, 398-403 (2019) 

Wang, X-4.etal. Mantle transition zone-derived EM component beneath NE China 
‘geochemical evidence from Canozoic potassic basalts, Earth Planet. Sci, Lett 465, 16-28, 
(2017. 

Kuritan, T, Ohtani, E.& Kimura, | Intensive hydration ofthe mantle transition zone 
beneath China caused by ancient stab stagnation, Nat. Geasc. 4, 712-716 (201), 
Rohrbach, A. & Schmidt M. W. Redox freezing and melting in the Earth's deep mantle 
resulting rom carbon-iron redox coupling. Nature 472, 208-212(2011, 
Soltanmohammadi, A. etal. Transport of volatle-ich melt from the mantle transition 
zone via compaction pockets: implications for mantle metasomatism and the origin of 
alkaline lavas inthe Turkish-Iranian plateau. J Petrol. 88, 2273-2310 (2018). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional elaims in 
published maps and institutional affiliations, 


© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Nature | Vol579 | 5 March 2020 | 91 


Article 


Methods 


Modelling approach 

The 2D petrological-thermomechanical numerical code I2VIS used in 
this study is based ona finite difference method usinga marker-in-cell 
technique ona staggered grid”. Itsolves mass, momentum and energy 
conservation equations (1)-(3) on the Eulerian grid and interpolates 
physical properties to the markers for advection accordingly. 
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where v is velocity, x,coordinate, P dynamic pressure,p density, ggrav- 
ityacceleration, c, heat capacity, T temperature, k thermal conductiv- 
ity, H, radioactive heating, H,=1jéy Shear heating, and H,= Tap 
adiabatic heating, where” is the material time derivative. The latent 
heatisimplicitly considered by computing the effective thermalexpan- 
sion and heat capacity. 


Model configuration 

The initial model set-up (6,000 «1,000 km discretized with 1,501 S01 
nodes) is composed of a3,500-km subducting plate and a 2,500-km 
overriding plate. The model imposes free-slip mechanical boundary 
condition at the top with 30-km-thick and viscosity of 10" Pa s‘sticky- 
air’ to mimic free surface; the bottom boundary is no slip, and side 
boundaries are periodic. The bottom no-slip condition is needed to 
define an initial horizontal velocity from which finite differences can 
be computed for this variable. Comparison with results froma model 
with abottom free-slip condition and closed vertical walls indicate that 
the bottom no-slip boundary condition doesnot affectthe subduction 
dynamics at all, asit is confined above the lower mantle. The initial 
thermal structureis defined by the half-space cooling age for the plates 
(50 Myr old) and an adiabatic thermal gradient of 0.5 K km’ for the 
underlying mantle. The thermal boundary conditions are isothermal 
onthe top and bottom, while side boundaries are periodic, consistent 
with the mechanical boundary conditions. 

To initiate subduction, the subducting slab extends down to about 
200 kmin the upper mantle together with a rheologically weak zone 
ontop of it which lubricate the initial contact between the plates. The 
high numerical resolution (4kmx2km) used here isneeded to ensure 
plate contact lubrication at shallow depths and localized, bending- 
related hydration at the trench outer-rise. Tests at a lower resolution 
(4km x4 km) result in less-localized slab mantle hydration, whereas 
with a resolution of 8 km x 4km, self-sustained subduction and slab 
rollback do not appear spontaneously. 


Viscous-plasticrheological model 

The rock mechanical behaviour is represented by the effective 
viscosity, which combines ductile (dislocation, diffusion and Peierls 
creep) and brittle (Drucker-Prager) deformation. The effective ductile 
viscosity is given by the harmonic average of the combined rheolo- 
gies (parameters and physical meaning are defined in Extended Data 
Table )): 
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where the dislocation and diffusion creep are given by™: 


: 
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For hydrated (wet) mantle, viscosity is reduced by 1,4, 


and C,, Cyoare water content and reference water content (100 ppm, 
whichis the water content for the dry upper mantle), respectively. 
ThePeierls creep Mreisi given by”: 


Nociers 


mosArorexp ftets*Mrsealy (o's \"]'| cy 
Meier AT | ee) 
Parameters are defined in Extended Data Table 1. 


Brittle behaviour occurs when stresses are above the plastic yield 
stress Ty: 
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Petrological modelling 

Petrological solid-solid phase changes are included through the den- 
sity and enthalpy look-up tables for basalt and pyrolite obtained from 
PERPLE X". Therefore, phase transition boundaries at 410 km and 
660 km have been considered. 

Thesolidus(7,=/(, T, H,0)) and liquidus (7,=/(P, 7)) temperatures 
for the upper mantle and MTZ are taken from high-pressure experi- 
ments" (Extended Data Fig. 1). Atlower mantle conditions, T, and 7, 
vary considerably among different experiments. Here weadopt the dry 
solidus nd liquidus of chondriticmantle®, as these are more compat- 
ible with the results of KLB-1 peridotite®, while the wet solidus“ was 
measured on samples with an estimated water content of 400 ppmwt. 

A conservative estimate of the melt fraction in the wet upper man- 
tle'*is applied: 


W,= Wor 
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where W, Wo) and Wa, (10 we) are water mass fraction of the ambient 
mantle, olivine and melt, respectively. Note that thewater solubility in 
olivine increases with pressure, so the melt fraction will decrease with 
depth if 1, remains constant (Extended Data Fig. 3). 

The silicate melt density (Extended Data Fig. 2) is taken from high- 
pressure sink-float experiments*, which show that the melt becomes 
denser than the surrounding mantle at around 400 km (refs. *) owing 
tothe increased compressibility. However, the presence of water gener- 
ally reduces the melt density such that it becomes buoyant relative to 
solidmantle (see Extended Data Fig. 2), renderingmelt extraction atthis 
depthpossible. We also test the melt density from molecular dynamics 
simulations athigh-pressure conditions” (Extended Data Figs.4f, 5d). 


Melt extraction timescale 

The distance over which the compaction rate decreases by afactor of 
eis the characteristic length scale of the compaction process and is 
knownas the compaction length, 6.: 


| (50) q@ 


where@and qarethe effective bulk and shear viscosities, respectively, 
of thepartially molten rock; n,is the fluid viscosity; Kisthe permeability 
given by the empirical equation: 


(12) 


where @is the porosity (melt fraction); K, (10° m’)is the permeability 
at the reference porosity @, (0.01);and n=3. 

Therelative migration velocity between themeltand thesolid matrix 
isw: 
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Thus, the extraction timescalet: 
(4) 
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where Apis the density difference between the solid and melt, ranging 
fromabout -100 kgm *to270kgm’ and typically about 70-180 kgm * 
in the lowermost upper mantle if most of the water is partitioned into 
melt, and thesurrounding mantle viscosity 2=10"-10*’Pasintheupper 
mantle. For melt fraction @ = 0.02, the estimated timescale would be 
f¢*3-20 kyr. Note that this timescale isnot the time for melt migration 
tothesurface, but only illustrates the efficiency of melt segregation from 
the solid matrixand the likelihood ofits emplacementat shallow depths. 
Indeed, asmall migration timeimpliesa large 4p (thatis, high buoyancy 
force) and/orhigh melt fraction (thatis, high permeability) and/or weak 
solid matrix which can be easily deformed during compaction/decom- 
paction processes. In this study, weusea reference value f,=6 kyr. 

Meltextraction depends not only onthese three parameters (and fluid 
viscosity), but also on the dihedral angle (that is, melt interconnectiv- 
ity). Previous experiments showed that the dihedral angle decreases 
systematically with increasing pressuresuch thatit probably allows for 
complete wetting at about 400 km (ref. **) where the dihedral angle is 
<5° (ref. ”). Atthese depths, melt interconnectivity is high even for low 
amounts of melt (<1%), which makes melt extraction possible provided 
thatthe extraction timescale s sufficiently low (thatis, the melt migration 
processisefficientand nothindered by thesolid matrix). Assuch, when 
theextraction timescale ris smaller than ,., the materials extracted at 
the surface forming plutonic intrusions or volcanics". Note that when 
the density of the meltishigher than thatof the solid surroundingman- 
tle, as occurs between 11.5 GPa and 13.5 GPa when the water content is 
low® (Extended Data Fig. 2b), there will be no melt extraction. In these 
conditions, the denser melt should percolate downward and accumulate 
overthe410-km discontinuity". However, dry melting generally doesnot 
occur atambient mantle conditions, except when there isan abnormal 
heat source associated with a mantle plume. As hydrogen partitions 
preferentially intothe melt, thewater contentin the melt wouldbe quite 
high, decreasingits density“. Asaresult, hydrous meltshould beless 
dense than the solid matrix throughout the upper mantle*, 

The melt migration processis illustrated here with more realistic 
models accounting for visco-elastoplastic deformationin atwo-phase 
flow regime. These models demonstrate that melt migration from 
the deep upper mantle to the surface should occur through several 
mechanisms: viscous diapirism, viscoplastic decompaction channels 
and elastoplastic dyking™* (Extended Data Fig. 6). For weak host rocks 


where viscous deformation dominates, such as the asthenosphere, 
magma migrates by diapirism. When the magma moves through the 
lithosphere-asthenosphere boundary (or thelowercrustin continents) 
where both ductile and brittle deformation occur, thefluid compaction 
pressure might reach the tensile strength, and magma could migrate 
by channelling. Ifthe host rock is completely elastoplastic, suchas the 
core of lithospheric mantleand upper crust, magma migrates by dyking. 


Water budget 

Thephase diagram reportingthe maximum water contentthat can be 
hosted in hydrated or wet mantle rocks (thatis, absorbed by nominally 
anhydrous minerals, NAMs) is builtupon the compilation from refs." 
(Extended Data Fig. 3). Itis often assumed that heterogeneously ser- 
pentinized mantle rocks below the oceanic Moho can contain up to 
2wt% H,0 (refs. *). Assuch, the maximum water contentin hydrated 
rocks* is scaled accordingly. 

Asthe oceanic crust completely dehydrates at about 300 km depth", 
generating fluids that fuel arc-volcanism only, we assumea dry crust for 
the sake of simplicity. On the other hand, dehydration of the underlying 
‘mantlewithin the transition zoneisthought to causeintracontinentalmag- 
matism'®, Consequently, we allow for serpentinization by bending-related 
deformation when the strain of mantle rocksis greater than 0.1 (ref.*). 

When the rock water content exceeds the saturation limit, decom- 
position ofhydrousminerals or water exsolutionin NAMsoccurs, and 
fluid markers are generated and migrate according to Darcy's law*** 
until they are absorbed by dry markers: 


Yo (16) 


where V3, V{ are the velocities of solid and fluid phases, respectively; 
1p, p, are the densities of solid and fluid, respectively; Va is aconstant 
percolation velocity; gis the gravity acceleration vector as defined in 
equation (2), and g, isits vertical component. 

Upon partial melting and extraction, the water is partitioned into 
the extracted melt and water in the residual peridotite as: 
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where D = 0.01 is the hydrogen partition coefficient for olivine poly- 
morphs. 


Falling block tests. The validity of the petrological model used here 
can be easily tested witha simple model in whicha falling block (simu- 
lating the subducting slab) sinks into the wet MTZ, exciting wet upwell- 
ings to the upper mantle and squeezing water into the lower mantle 
(Extended Data Fig. 4). These tests indicate that the melt layer gets 
thicker (100 km) when melt extraction is not efficient, owing to very 
smallamounts of melt/water and/or denser melt phase. Thismightex- 
plain the thick low-velocity layersabove 410 km inmany regions’. After 
melt extraction, less water remains above the transition zone, causing 
higher viscosity and lessmelt fraction, which yields a larger extraction 
timescale: thatis, the melt preferentially ponds above 410 km depth. 


Seismic velocity anomalies. The seismic velocity perturbation in 
Fig. 2chave been computed 


SInv: (19) 


where V,.;is the average seismic velocity at specific depth. 
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The change of seismic wave velocities caused by the existence of a 
fluid phase is given by: 
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2 .V2,V are the shear and compressional wave veloc- 
ities of the solid phase; k, u,vand pare the bulk modulus, shear mod- 
ulus, Poisson's ratio and density of the solid phase, respectively. 
J =(1-9)p+ gp; is the effective density when fluid (for example, 
melt) exists. K, and Nare the bulk and shear moduli, which are 
dependent on melt fraction and dihedral angle”: 


K,=(1- @)k(L- (1- @)"*) (23) 
N= (1 @)HA~ (= @)"*) (24) 
where 

= ayp + a1) + ayp(A- gp)" (28) 

1,,= byp + bx 9) + byg(1- 9)? (26) 

and gisthe dihedral angle 
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with A,,,.A, the area of solid-solid contact and solid. 
respectively®. 

Extended Data Fig. 7 showsK,/kandN/jfor the equilibrium geometry 
model at various dihedral angles. 


iiquid contact, 
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Extended DataFig.1| Solidus and iquidusofbasaltand mantle.a,Thesolidus _ b,Solidusand/orliquidusof mantle collected from literature. Sol, solidus; Liq, 
and liquidus of basalt are obtained fromexperimentaldata®°*".Thesolidus _ liquidus; BrPe, MgSiO,-MgO (bridgmanite + perictase); Fiqul, MgSiO,-SiO, 
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Extended DataFig.2| Melt density of basaltand mantle for different 
temperatures and/or water contents. a, Basalt. PREM, density profile from 
Preliminary Reference Earth Model; dry melt density attemperatures of 
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Extended DataFig.5| Additional parameter tests. a,b, Extractiontimescales inclusions inthe transition zonewith¢,,=6 kyr. Note thatallthe models differ 
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Extended Data Table 1 | Physical properties of rocks used in this study 


Property Symbol Unit Value 
Gravity 2 m/s" 9.81 
Water content Cy wt. % - 
Reference water Cwo wt. % 0.01 
content 

Melt fraction ¢ : 
Melt-weakening @ - 28 
factor 

Shear modulus H GPa 80 
Diffusion creep 

prefactor A s! 8.7x10'° 
Activation energy E kJ mol! 300 
Activation volume Vv cm’ mol" 6 
Burgers vector b nm 0.5 
Grain-size exponent m - 25) 
Water exponent r - 0.8 
Dislocation creep 

prefactor A s! 3.5x10 
Activation energy kJ mol! 540 
Activation volume em’ mol" 20 
Stress exponent n - 3.5 
Water exponent r : 12 
Peierls creep 

prefactor Ap Pa’ s 10%? 
Activation energy Bpaeae kJ mol! 532 
Activation volume Vocierts em’ mol" 12 
Peierls stress Opeierts GPa 91 
Exponent pa a2 1,2 
Yield stress Ty 

Cohesion c MPa 10 
Friction coefficient u - 0.6 
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Colonization, speciation and extinction are dynamic processes that influence global 
patterns of species richness". Island biogeography theory predicts thatthe 
contribution of these processes to the accumulation of species diversity depends on 
the area and isolation of theisland’*, Notably, there has been no robust global test of 
this prediction for islands where speciation cannot be ignored”, because neither the 
appropriate data nor the analytical tools have been available. Here we address both 
deficiencies to reveal, for island birds, the empirical shape of the general relationships 
that determine how colonization, extinction and speciation rates co-vary with the 
area and isolation of islands. We compiled a global molecular phylogenetic dataset of 
birds on islands, based on the terrestrial avifaunas of 41 oceanic archipelagos 
worldwide (including 596 avian taxa), and applied anew analysis method to estimate 
the sensitivity of island-specific rates of colonization, speciation and extinction to 
island features (area and isolation). Our model predicts—with high explanatory 
power-several global relationships. We found a decline in colonization with isolation, 
adecline in extinction with area and an increase in speciation with area and isolation. 
Combining the theoretical foundations of island biogeography”* with the temporal 


information contained in molecular phylogenies" proves a powerful approach to 
reveal the fundamental relationships that govern variation in biodiversity across 


the planet. 


Akey feature of global diversity is the tendency for some areas to har- 
bour many more species than others”*, Uncovering the drivers and 
regulators of spatial differences in diversity of simple systems such 
as islands is a crucial step to understanding the global distribution 
of species richness. The two most prominent biodiversity patterns 
in fragmented or isolated environments worldwide are the increase 
in species richness with area and the decline in species richness with 
isolation". In their theory of island biogeography, MacArthur and 
Wilson proposed how the processes of colonization and extinction 
could explain these patterns’*, They argued that the rates of these 
processes are determined by the geographical context: colonization 
decreases withisolation and extinction decreaseswitharea’®, They also 
suggested that rates of formation of island endemic species through 
in situ speciation increase with island isolation and area®. Despite an 
abundance of studies over five decades that support the general pat- 
terns predicted by MacArthur and Wilson”, tests of predictions 
regarding the dependence of the underlying processes~colonization, 
speciation and extinction-on island geographical context (area and 
isolation) are few in number, and are either restricted in temporal, 


geographical or taxonomic scope™™, or seek to infer speciation 
rates in the absence of data on the phylogenetic relationships among 
species*". Asa result, there has been no robust and powerful test of 
MacArthur and Wilson's predictions on a global scale, and the effect 
of area and isolation on biogeographical processes acting on macro- 
evolutionary timescales remains largely unexplored. 

Here we expand on approaches that leverage the information in 
time-calibrated molecular phylogenies of insular species'"”2"* todeter- 
mine how the processes of colonization, speciation and extinction 
are influenced by area and isolation. The dynamic stochastic model 
DAISIE" (dynamic assembly ofistands through speciation, immigration 
and extinction) canaccurately estimate maximum-likelihood rates of 
colonization, extinction and speciation rates (CES rates) frombranch- 
ing times (colonization times and any insitu diversification events) and 
endemicity status of species that results from one or multiple inde- 
pendent colonizations ofa given island system (for example, all native 
terrestrial birds onan archipelago)". This method can also detect the 
presence or absence of diversity dependence in rates of colonization 
and speciation, by estimatinga carrying capacity (upper bound to the 


"Museum fOr Naturkunde, Leibniz Institute for Evolution and Biodiversity Science, Berlin, Germany. "Naturalis Biodiversity Center, Leiden, The Netherlands. “Groningen Institute for Evolutionary 
Life Sciences, University of Groningen, Groningen, The Netherlands. “Unit of Evolutionary Biology/Systematic Zoology, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 
Germany, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK. “Museu de Histéria Naturale da Ciéncia da Universidade do Porto, Porto, Portugal. “Centro de lavestigago 
«em Biodiversidade e Recursos Genéticos (CIBIO),InBio, Laboratério Associado, Universidade do Porto, Vairdo, Portugal. "FitzPatrick institute, DST-NRF Centre of Excellence, University of Cape 
Town, Cape Town, South Africa, “institut de Systématique, Evolution, Biodiversté(ISYEB), Muséum National dHistoire Naturelle, CNRS, Sorbonne Université, EPHE, UA, Paris, France. "Edward 
Grey institute, Department of Zoology, University of Oxford, Oxford, UK. "Environmental Futures Research Institute, Griffith University, Brisbane, Queenstand, Australia, “Research Unit of 
Biodiversity (UO-CSIC-PA), Oviedo University, Mieres, Spain. “Unité Mite de Recherche 5174, CNRS-IRD Paul Sabatier University, Toulouse, France. “e-mail: uisvalente@naturalis.n 


92 | Nature | Vol579 | 5 March 2020 


Total species 


a 
Within situ radiations 

on «= @ 

@20 (@ No extant in sit radiations 

@30 


aK 


1. Cocos (Keeling) ojo 
2 Chagos 0j0 
3 Rapa Nui 2j0 
m4 Aldabra Grou 
5 Ascension  0|0 
© 6 Pitcairn 8128.6 
"87 Bermuda 540 
©" 8 Niue 5j60 
= 9 Cocos. 4/50 
== 10 Chatham 14)92.9 
S11 Noro 14786 
‘© 12 Ferando de Noronha 3183.3 
se 13 Socorro 7100 
<== 14 Galapagos 27/100 
© 15 Gough 1/100 
ae 16 Society 18)38.9 
== 17 Ogasawara 10150 
Se 18 Reunion 131692 
== 
— 


12)917 


19 Marquesas 19/50 
20 Juan Femandez 6|100 
@ 21 Azores 17/765 
—= 22 LordHowe t1)81.8 
se 23 Guadalupe 11/809 
—ses 24 Mauritius Island 15)60 
#25 Christmas Island 4/50 
26 Samoa 2272.7 
27 Saint Helena 3)0 
sess 28 Comoros. 41/96.5 
‘we 29 Marianas 191579 
—eemee 80 Rodrigues 9144.4 
= 31 Cape Verde 10/100 
‘= 82 Tristan da Cunha [100 
———=a 53 Madeira 19169.5 
es 34 Palau 16)62.5 
net 35 Canary Islands. 50)87.2 
'*36 Selvagens 1/100 
er 57 Hawaii 51/75 
=== 58 Sao Tomé and Principe _44|100 
sms 59 Now Caledonia 46)82.2 
“means 40 Tonga 23)43.5 
Se 41 Seycheles (Innet) 12/917 


‘sk 


60 50 40 30 
Age (Myr) 


Fig.1| Archipelago andisland bird colonization time data. Circles show the 
number of species thatbelongto our focal group (both extinctand extant) 
found in eacharchipelago (atthe time of human arrival). Numbers on the map 
correspond to numbersto the leftof thearchipelago name, Numbers tothe 
rightof the archipelago name indicate the number of species from our focal 
assemblage on the archipelago | the percentage of species sampledin the 
phylogenetic trees. Evenspeciesnot sampled inthe treesare accounted forby 
including themas missing species that could have colonized at any timesince 
the emergence of the archipelago. Colonizationtimesplot: grey horizontal 
linesindicate archipelago ages (Extended Data Table 1). Violin plots (blue) show 
the kernel density of the distribution of times of colonization of bird speciesin 
each archipelago, obtained fromthe phylogenetic trees. Thick blacklines 
inside violin plotsindicate the interquartile distance; thinblack lines indicate 


number of speciesin anisland system). Here we extend DAISIE toesti 
mate the hyperparameters that control the shape of therelationships 
between CES rates, and the area andisolation of islands worldwide. 
The accurate estimation of fundamental island biogeographical 
relationships requires suitable data from many archipelagos, but 
divergence-dated phylogenies of complete communities on islands 
remain scarce. Hence, we produced new dated molecular phylogenies 
for theterrestrial avifaunas of 41 archipelagos worldwide. Here we refer 
to both true archipelagos (composed of multiple istands) and isolated 
insular units that consist of single islands (for example, Saint Helena) 
as ‘archipelago’. For each archipelago, we compiled avian taxon lists 
(excludingintroduced, marine, migratory andaquaticspecies,as well 
as birds of prey, railsand nocturnal birds; see Methods) and collected 
physical data (Fig. 1and Supplementary Data1,2). We use archipelagos 
as our insular unit, because the high dispersal abilities of birds within 
archipelagos suggest that, for birds, archipelagos can be considered 
equivalent to single islands for less dispersive taxa”, and because 
archipelagos constitute the most-appropriate spatiotemporal unit 
for framing analyses of biodiversity patterns ata large scale“. We 
extracted colonization and speciation times for each archipelago from 


20 10 


the 95% confidence interval; black dotsindicate the median, Archipelagoswith 
no violin plot or dots arecases for which no species of our focalassemblage 
were presentatthe time of human arrival, or none weresampled using 
molecular data. Birds from leftto right: Seychellessunbird, Seychelles magpie 
robin, silvereye, Principe thrush, laurel pigeon, dodo (extinct), Mauritius fody, 
red-moustached fruitdove (extinct), Galapagos warbler and Norfolk kaka 
(extinct). Bird images used with permission from: C, Baeta (Principe thrush), 
P.Cascao (Galépagos warbler), M. Hammers (Seychelles sunbird and magpie 
robin), J. Hume (dodo), D. Shapiro (Mauritius fody) and). Varela (laurel pigeon). 
‘There arenoinsitu radiationsin the Mascarenes (Mauritius, Reunion and 
Rodrigues) because we treat the islandsas separate entities (but see 
‘Sensitivity to archipelago selection and isolation metrics’ inthe 

Methods). Myr, million years, 


thephylogenetic trees, producinga ‘global dataset’ for the 41 archipela- 
gos, which includes the complete extantavifauna of each archipelago, 
plusall species known to have become extinct due to anthropogenic 
causes. The dataset comprises 596 insular taxa from 491 species. The 
phylogenies revealed a total of $02 archipelago colonization events 
and 26 independentin situ ‘radiations’ (cases in which diversification 
has occurred within an archipelago), which ranged in size from 2 to 
33 species (the Hawaiian honeycreepers being the largest clade). The 
distribution of colonization times is summarized in Fig. Land the full 
datasetis provided in Supplementary Data1. 

Our extension of the DAISIE framework enables us to estimate hyper- 
parameters that control the relationship between archipelago area 
andisolation, andarchipelago-specific local CES rates, thatis, rates of 
colonization, cladogenesis (within-archipelago speciation thatinvolves 
insitulineage splitting), anagenesis (within-archipelago speciation by 
divergence from the mainland without in situ lineage splitting), natu- 
ral extinction rates and carrying capacity. We tested the hypothesis 
thatareaand distance from the nearest mainland have an effect on the 
specific CES rates, and, in casesin whicha significant effect was iden- 
tified, estimated its shape and scaling. We developed a set of a priori 
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Fig. 2| Estimated relationshipsbetweenisland area andisolation, and local 
island biogeography parameters. Isolation was measured as the distanceto 
thenearest mainland. Relationships shown arebased onthe maximum 
likelihood global hyperparameters of the best models (equations describing, 
the relationships are provided in Supplementary Table 1). Darker lines, M14 
model; lighter lines, M19 model. Under the M14 model, the cladogenesis rate 


models (Supplementary Table1) inwhich the CES rates are power-law 
functions of archipelago features. Area has been proposed to have a 
positive effect on cladogenesis and carrying capacity***and anegative 
effect on extinction rates**. Archipelago isolationis hypothesized to 
reduce colonization rates’ and increase anagenesis rates”. Models that 
include or exclude diversity dependence in rates of colonization and 
cladogenesis"® (thatis, estimatinga carrying capacity parameter) were 
compared. Wealso considered aset of post hoc models with alternative 
shapes for the relationships (posthoc power and post hoc sigmoid 
models; Methods and Supplementary Table 1). 

We fitted a set of 28 candidate models to the global dataset using 
maximum likelihood (Supplementary Table 2). The shape of the rela- 
tionship of CES rates with area and distance for the two best models 
is shown in Fig. 2. Under the preferred a priori model (lowest value of 
the Bayesian information criterion; M14, eight parameters) coloniza- 
tion rates decline with archipelago isolation (exponent of the power 
law =~0.25 (95% confidence interval =-0.17--0.34)) andextinctionrate 
decreases with area (scaling = 0.15 (-0.11--0.18)). Rates of cladogen- 
esis increase with area (scaling = 0.26 (0.13-0.37)), while anagenesis 
increases with isolation (scaling= 0.42(0.24-0.61)). The preferred post 
hocmodel (M19, eightparameters) was also the preferred model overall 
and differs qualitatively from the preferred a priori model MI4 only 
in the cladogenesis function. In the M14 model, cladogenesisis solely 
a function of area, whereas in the M19 model cladogenesis depends 
interactively and positively on both area and distance fromthe nearest 
mainland, such that the cladogenesis-area relationship is steeper for 
more isolated archipelagos (Fig. 2and Extended DataFig.1).Inaddition, 
we found noevidence for diversity dependence, as thecarrying capacity 
(K)was estimated to be much larger than thenumber of species onthe 
island and models withouta K parameter (no upper bound to diversity), 
such as M14 and M19, performed better than models that included 
this parameter (Supplementary Table 2). We also tested whether the 
inclusion of a combination of true archipelagos and single islandsin 
our dataset could have affected our results, for exampleif opportuni 
ties for allopatric speciation are higher when an areais subdivided into 
multipleisiands”’. We repeated analyses in which single island units 
were excluded and found that the same model (M19) is preferred with 
similar parameter estimates. We therefore discuss only the results for 
themain dataset (including both single islandsandtruearchipelagos). 
Our results are robust to uncertainty in colonization and branching 
times (see ‘Sensitivity toalternative divergence times and tree topolo- 
gies’ in the Methods). 

Aparametric bootstrap analysis of the two preferred models (M14 
and MI9) demonstrated that the method is able to recover hyperpa- 
rameters with high precision and little bias (Extended Data Fig. 2). 
Totest the significance of the relationships between area, isolation 
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dependsonly onthearea. Under the M19 model, the cladogenesis rate 
increases with both areaand distance to the nearest mainland, and thus lines 
for more (far, 5,000 km) and less (near, 50 km) isolated islandsare shown. See 
Extended Data Fig. for the relationship of cladogenesis with both area and 
distance under the M19 model. 


and CES rates, we conducted a randomization test on the global 
dataset with reshuffled areas and distances. This test estimated 
the exponent hyperparameters as zero in most reshuffled cases 
(thatis, no effect of area or isolation was detected; Extended Data 
Fig. 3), confirming that it is the observed relationships between 
diversity and archipelago characteristics that generate our param- 
eter estimates, 

Toassess model fit, wesimulated archipelago communitiesunder the 
best model (M19) and found that-for mostarchipelagos-the observed 
diversity metrics (the numbers of species, cladogenetic species and 
colonizations) were similar tothe expected numbers, with some excep- 
tions; for example, diversity was underestimated for Comoros and 
Sao Toméand Principe (Fig. 3and Extended DataFig. 4). The ability of 
the model to explain observed values (total species, pseudo-R°=0.72; 
cladogenetiespecies, pseudo-R*= 0.52; colonizers, pseudo-R*= 0.60) 
was very high considering the model includes only 8 parameters (at 
least 12 parameters would beneededif each rate depended onarea and 
isolation, and at least 164 parametersif each archipelago wasallowed 
tohaveits own parameters) and wasable to explain multiple diversity 
metrics. This represents a very large proportion of the explanatory 
power thatwould be expected tobe obtained for data generated under 
the preferred model (Extended Data Fig. 5). Simulations under the best 
model reproduced the classic observed relationships between area, 
distance and diversity metrics (Fig. 4).. 

Our approach reveals the empirical shape of fundamental biogeo- 
graphical relationships that have previously been difficult to estimate. 
Inagreement with recent studies”, we found strong evidence for a 
decline in the rates of colonization with isolation and in the rates of 
extinctionwitharea, confirming two of the key assumptions of island 
biogeography theory’. The colonization-isolation effectwas detected 
despitethe fact that the decline in avian species richness with distance 
from thenearest mainlandin our empirical data was not as pronounced 
asin otherless-mobile taxa, revealing that isolation isa clear deter: 
minantof the probability ofimmigrationand the successful establish- 
ment of populations even ina highly dispersive group such as birds. 
The extinction-area relationship has been a fundamental empirical 
generalization in conservation ecology (for example, for the design 
of protected areas”); here we were able to characterize the shape of 
this dependence at the global spatial scale and macro-evolutionary 
timescale. 

We provide insights into the scaling of speciation with areaand isola- 
tion. in contrast to previous studies on within-istandspeciation, which 
have suggested the existence of an area below which cladogenesis does 
not take place on single islands’, we do not find evidence for such an 
area threshold atthe archipelago level and, under our model, speciation 
is predicted to be non-zero even in small areas. In addition, our post 


Fig.3| Goodness of fit of the preferred model (M19). The map identifies 
whether the diversity metrics were well estimated (the empirical valuematches 
the 95% confidence interval of simulations), underestimated (the empirical 
value ishigher than the 95% confidence interval) or overestimated (the 


hoc finding that rates of cladogenesis increase through an interactive 
effect of both island size and distance from the nearest mainland (Fig.2 
and Extended Data Fig. 1) providesa mechanism that limits radiations 
to archipelagos that are both large and remote”. Why this interac- 
tion exists requires further investigation, but one possibility is that 
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empirical valueis lower than the 95% confidence interval). Intervals arebased 
(on 1,000 simulations of each archipelago (Extended Data Fig. 4).Numberson 
the mapindicatethe archipelagos describedin Fi 


unsaturated niche space provides greater opportunities for diversifi- 
cation’. Inaddition to the effects of physical features on cladogenesis, 
we found that rates of anagenesis increase with island isolation. While 
impressive insular radiations tend to receive the mostattention from 
evolutionary biologists (for example, Darwin's finches or Hawaiian 


a b c 

60) Observed 50 50 
4 504 * Predicted joo 7 
8 2 
3 5 5 
z 2 30 40 
zg 5 20 2 
& 3 8 

$10 10 
a 2 
oF TT TT T+ 9 T T T TT 
510 50 © 500 5,000 510 50 500 000 
Area (km?) ‘Area (km*) 

4 eo ° 504 ' 50 

50 S 
2° ' go 2° 
4 2 30 2 x0 
F 204 7 : 
g @ 20 3 20 
B20) Minin | 2 8 

104 He ————= -—«s'5 102 10. 

ol 0 a ' | 

0 1,000 2,000 3,000 4,000 5,000 © — 1,000 2,000 3,000 4,000 5,000 © — 1,000 2,000 3,000 4,000 5,000 


Distance to nearest mainland (km) Distance 


Fig. 4| Observed and predicted island diversity-area and island diversity- 
distance relationships. Grey vertical lines show the 95% confidence intervals 
across 1,000 datasets simulated for each of the 41archipelagosassuming the 
MI9 model. Blue points indicate the mean values of the simulations;the blue 
line indicates the fitted line for the simulated data; red points are the observed 
valuesin the empirical data; the red line showsthe fitted line for the empirical 
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data; the red shaded areais the 95% confidence interval of thepredicted 
relationship for the empirical data. a-c, Relationships betweenisland diversity 
and area. a, Total number of species. b, Cladogenetic species. c, Number of 
colonizations. d-f, Relationships between island diversity and distance of the 
islandtothe mainland. d, Total number of species. e, Cladogenetic species. 
£,Number of colonizations. 
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honeycreepers), our phylogenies revealed thatthe majority of endemic 
birdsin our datasetin fact display an anagenetic pattern (atthetime of 
human arrival, 231 0ut of 350 endemicspecies had noextantsister taxa 
on the archipelago and there were only 26 extant in situ radiations). 
The positive effect of archipelago isolation on rates of anagenesis that 
we estimate suggests that this fundamental but overlooked process 
isimpeded by high levels of movement between island and mainland 
populations. 

Avariety of global patterns of biodiversity have been described— 
from small islands and lakes, to biomes and continents—but the 
processes that underpin these patterns remain to be explored. Our 
simulations using parameters estimated from data were able to 
reproduce the classic global patterns of island biogeography across 
4larchipelagos (Fig. 4). This advances our understanding of macro- 
scale biology, by providing missing links between local processes, 
environment and global patterns. More than half a century after 
the seminal work of MacArthur and Wilson’, we now have the data 
and tools to go beyond statistical descriptions of diversity patterns, 
enabling us to quantify community-level processes that have long 
been unclear. 
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Methods 


elago selection 

Wefocus on oceanicislands, thatis, volcanicislands thathaveneverbeen 
connected toany other landmassin the past. Wealso include the Granitic 
Inner Seychelles, even though these islands havea continental origin, 
because they have been separated from other landmasses fora very long 
period of time (64 million years)" and canbe considered quasi-oceanic, 
as all extantavian species originated in much more recent times. The 
4 archipelagos chosen are located in the Atlantic, Indian and Pacific 
Oceans, with latitudes between 45° north and south. Islands within 
these archipelagos are separated by a maximum of 150 km. The sole 
exceptions are the Azores and Hawaii, two very isolated systems where 
the distances between some islands exceed this value. The shape files 
used to plot the maps of Figs. 1,3 were obtained froma previous study”. 


Physical and geological data 

Fullarchipelago dataare providedin Supplementary Data2andExtended 
Data Table 1. We obtained data on the total contemporary landmass 
area for each archipelago. For our isolation metric, we computed the 
minimumround€arth distancetothenearestmaintand (O,,) inkmusing 
Google Earth. We considered ‘nearest mainland’ tobe thenearest prob- 
able source of colonists (but see ‘Sensitivity to archipelago selection 
and isolation metrics’ fordifferent isolation metrics). Thisisthe nearest 
continent except forisland groups that were closer to Madagascar, New 
Guinea or New Zealand than to the continent, in which case we assigned 
these large continent likeislands as the mainland. This s supported by 
our phylogenetic data~for example, many Indian Oceanisland taxahave 
closest relatives on Madagascar rather than mainland Africa. 

Island palaeo-areas and past archipelago configurations havebeen 
shown to be better predictors of endemic insular diversity than con- 
temporary area'™. By contrast, island total native and non-endemic 
richness is better predicted by present island characteristics”. As 
insufficient data onisland ontogeny was available (thatis, describing 
the empirical area trajectories from island birth to present), we ana- 
lysed contemporary areaand isolation as theseare currently the most 
appropriate units for our dataset. 

We conducted an extensive survey of the literature and consulted 
geologists to obtain the geological ages for each archipelago (Extended 
Data Table), treating the age of the oldest currently emerged island 
asanupper bound for colonization. Islands may have been submerged 
and have emerged multiple times and we consider the age of the last 
known emergence. For the Aldabra Group we used an age older than 
the published estimate. The current estimated age of re-emergence 
of Aldabra is 0.125 million years™, but9 out of 12 Aldabra colonization 
events in our datasetare older, suggesting that the archipelago was not 
fully submerged before thisand may have been available for coloniza- 
tion for alonger period. Therefore, for Aldabra we used an older upper 
bound of I million years for colonization, although we acknowledge 
that the mitochondrial markers used for dating may not provide suf- 
ficient resolution at the shallow temporal scale of the published age. 
For Hawaii, the colonization times that we obtained for morethanhalf 
of the colonization events were older than the age of the currenthigh 
islands that is often used as a maximum age for colonization (around 
Smillion years). Therefore, instead of thisage, weused the much older 
estimate of 29.8 million years of the Kure Atoll” toaccount for currently 
submerged or very low-lying Hawaiian Islands thatcouldhave received 
colonistsin the past. For Bermudaand Marianas, we could not find age 
estimates in the literature, and we therefore consulted geologists to 
obtain these (P. Hearty, R. Stern and M. Reagan, personal communica- 
tion; Extended Data Table 1). 


Island avifaunas 
Our sampling focused on native resident terrestrial birds and weconsid- 
ered only birds that colonize by chance events (for example, hurricanes 


or rafts). We thus excluded marine and migratory species, because 
they are capable of actively colonizingan islandatamuch higher rate. 
We focused on songbird-like and pigeon-like birds, which constitute 
the majority of terrestrial (land-dwelling) birds on islands. Following 
aprecedent set by previous work”, we included only species from 
the same trophiclevel (inthe spirit of MacArthur and Wilson’s model): 
we excluded aquatic birds, birds of prey, rails (many are flightless or 
semi-aquatic) and nightjars (nocturnal). Wealso excluded introduced 
and vagrant species. Including species such as rails and owls (which 
are components of many sland avifaunas) would have led to ahigher 
estimate of the product of colonization rate and mainland pool size due 
toalarger mainland pool, and potentially to higher estimated rates of 
anagenesis (many owl or rail species are island endemics with no close 
relatives on the islands). 

For the focal avian groups, we compiled complete taxon lists for 
each of the 41 archipelagos based on recent checklists from Avibase 
(http://avibase.bsc-eoc.org), which we cross-checked with the online 
version of the Handbook ofthe Birds ofthe World (HBW)”. We followed 
the HBW’s nomenclature and species assignations, except for 12 cases 
in which our phylogenetic data disagree with HBW’s scheme (notedin 
the column ‘Taxonomy’ of Supplementary Data 1). For example, in 11 
cases phylogenetic trees support raising endemic island subspecies 
to species status (we sampled multiple samples per island taxon and 
outgroup, and the island individuals forma reciprocally monophyl- 
etic well-supported clade), and for these taxa we decided itwas more 
appropriate to usea phylogenetic species concept so as not tounder- 
estimate endemicity and rates of speciation (Supplementary Data 1). 
Were-ran DAISIE analyses using HBW’ classification and found thatthe 
maximum-likelihood parameters are very similar and thus we report 
only the results using the scheme based on the phylogenies produced 
for this study. 

Foreach bird species found on each archipelago, we aimed to sample 
sequence data for individuals on the archipelago and the closest rela- 
tives outside the archipelago (outgroup taxa). Our sampling success 
per archipelago is shown in Fig. land Extended Data Table 1. 


Extinct species 
We donot count extinctions with anthropogenic causesasinfluencing 
the natural background rate of extinction. Therefore, we explicitly 
include species for which thereis strong evidence that they have been 
extirpated by humans. We treat taxa extirpated on an archipelago by 
humansas though they had survived in that archipelago until the pre- 
sent following our previously published approach™, 

We identified anthropogenic extinctions based on published data“ 
and personal comments (J. A. Aleover and). C. Rando on unpublished 
Macaronesian taxa; F, Sayol and S. Faurby). We include the species 
present on the islands that belong to our archipelago definition as 
described in Supplementary Data 2. We excluded largely hypothetical 
accounts or pre-Holocene fossils that greatly predate human arrival. 
Our dataset accounts for 153 taxa that were present on first human 
contact and have gone extinct since, probably because of human 
activities including the introduction of invasive species by humans. 
To our knowledge, 71 of these taxa have previously been sequenced 
using ancient DNA or belong to clades present in our trees, and we 
were thus ableto include themin the phylogenetic analyses as regular 
data (n= 54), orasmissing species by adding thems unsampled spe- 
cies toa designated clade (n=17). For the remaining 82 extinct taxa, 
sequences were notavailable and we were unable to obtain samplesand 
toallocate them to clades. We assume that these taxa represent extinct 
independent colonizations and we included them in theanalysesusing 
the ‘Endemic MaxAge' and ‘Non_endemic MaxAge’ options in DAISIE, 
which assume that they have colonized atany given time since the birth 
of the archipelago (but before any in situ cladogenesis event). As an 
example, our dataset includes the 27 species of Hawaiian birds belong: 
ing to our focal group that are known to have gone extinct since human 
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colonization. Eight of these species were included using DNA data, 
I7wereaddedas missing species to their clades (14honeycreepersand 
3.Myadestes) and two were added using the Endemic MaxAge option 
in DAISIE (Corvus impluviatus and Corvus viriosus). 


Sequence data from GenBank 

We conducted an extensive search of GenBank for available DNA 
sequences from the $96 island bird taxa that fitted our samplingcriteria 
and from multiple outgroup taxa, using Geneious v.11”. Themolecular 
markers chosen varied from species to species, depending on which 
marker was typically sequenced for the taxon in question, the most 
commonly sequenced marker was cytochrome b. In total, we down- 
loaded 3,155 sequences from GenBank. For some taxa, sequences from 
both archipelago and close relatives from outside the archipelago were 
already available from detailed phylogenetic or phylogeographical 
analyses. In some cases, a target species had been sampled, but only 
from populations outside the archipelago. In other cases, thespecieson 
the archipelago had been sampled, but thesampling of the relatives out- 
side of the archipelago was lacking or only available for distant regions, 
which meanta suitable outgroup was not available in GenBank. Finally, 
for some speciesthere were no previous published sequences available 
in GenBank. GenBank accession numbersand geographical origin for 
the downloaded sequences are provided in the DNA matrices (https:// 
doi.org/10.17632/vf95364vx6.1) and maximum clade credibility trees 
(https://doi.org/10.17632/péhmsw8s3b.2) uploaded to Mendeley Data. 


Sequence data of new samples 

‘Sequences available in GenBank covered only 54% (269 out of S02) 
of the total independent colonization events. Weimproved the sam- 
pling by obtaining new sequences for many island taxa (n = 174 taxa) 
and from their close relatives from continental regions (n=78). We 
obtained new samples from three sources: field trips, research col- 
lections and colleagues who contributed field samples. New samples 
were obtained during field trips conducted by M.M. (Gulf of Guinea 
and African continent); B.H.W. and C.T. (Comoros and Mayotte, Mau- 
ritius Island, Rodrigues, Seychelles); S.M.C. (New Caledonia); J.C 
(Macaronesia, Europe and Africa) and LV. (New Caledonia), between 
1999.and 2017. Samples of individuals were captured using mist-nets or 
spring traps baited with larvae. Blood samples were taken by brachial 
venipuncture, diluted in ethanol or Queen's lysis buffer in a micro- 
centrifuge tube. Birds were released at the point of capture. Aldabra 
Group samples were obtained from research collections of the Sey- 
chelles Islands Foundation. Museum samples from several Galapagos 
and Comoros specimens were obtained on loan from, respectively, 
the California Academy of Sciences and the Natural History Museum 
London. Additional samples from various localities (Aldabra Islands, 
Iberian Peninsula, Madagascar and Senegal) were provided by col- 
laborators, asindicated in Supplementary Table 3. Sampleinformation 
and GenBank accession numbers for all new specimens are provided 
in Supplementary Table 3. 

DNAwas extracted from blood, feathers and museum toe-pad sam- 
ples using QIAGEN DNeasy Blood and Tissue kits (Qiagen). For museum 
samples, we used a dedicated ancient DNA laboratory facility at the Uni- 
versity of Potsdam to avoid contamination. The cytochrome bregion 
(1,100 base pairs) was amplified using the primers shown in Extended 
Data Table 2. DNA from historical museum samples was degraded and 
cytochrome b could not be amplified as a single fragment. We thus 
designed internal primers to sequence different overlapping fragments 
ina stepwise manner (Extended Data Table 2). 

PCRs were set up in25-1 total volumes including 5 pl of buffer Bioline 
MyTaq, 1p! (10 mM) of each primer and 0.12 tI MyTaq polymerase. 
PCRs were performed with the following thermocycler condition: 
initial denaturation at 95 °C for Imin followed by 35 cycles of denatura- 
tion at 95 °C for 20 s, with an annealing temperature of 48 °C for 20s, 
and extension at72°C for15 sanda final extension at72°C for 10 min. 


‘Amplified products were purified using exonuclease | and Antarctic 
phosphatase, and sequenced at the University of Potsdam (Unit of 
Evolutionary Biology/Systematic Zoology) on an ABI PRISM 3130x1 
sequencer (Applied Biosystems) using the BigDye Terminator v3.1 
Cycle Sequencing Kit (Applied Biosystems). We used Geneious v.11 to 
edit chromatograms and align sequences. 


Phylogenetic analyses 

Toestimate times of colonization and speciation for each archipelago, 
we produced new divergence dated phylogenies or compiled published 
dated trees, to yield a total of 91 independent phylogenies (maximum 
clade credibility trees and posterior distribution deposited in Mende- 
ley, https://doi.org/10.17632/péhmSw8s3b.2) forall new trees produced 
for this study; the 11 previously published trees are available upon 
request). Information onall alignmentsand trees, includingmolecular 
markers, data sources, calibration methods and substitution model 
are provided in Extended Data Tables 3, 4 and Supplementary Table4. 
The majority of alignments and phylogenies focus on a single genus, 
although someinclude multiple closely related genera or higher order 
clades (family, order) depending on the diversity and level of sampling 
of the relevant group (taxonomicscopeis described in Extended Data 
Tables3, 4). Mostalignments include taxa from avariety of archipela- 
gos. Alignments were based on a variety of markers, accordingtowhich 
marker had most often been sequenced for a given group. 

For the new dating analyses conducted for this study, we created 
80 separate alignments for different groups using a combination 
of sequences from GenBank (n=3,155) and new sequences (n= 252) 
produced for this study. In some cases, we obtained DNA alignments 
directly from authors of previous studies and these are credited in 
Extended Data Table 3. Phylogenetic divergence dating analyses were 
performed inBEAST 2". For each alignment, we performed substitution 
model selection in jModeltest® using the Bayesian information crite- 
rion (BIC). Weusedrates of molecular evolution for avian mitochondrial 
sequences, which have been shown to evolve in aclock-like manner at 
an average rate of around 2% per million years®. Molecular rate cali- 
brations can be problematic for ancient clades, due to high levels of 
heterotachy inbirds*.In addition, mitochondrial DNA saturates after 
about 10-20 million years, and genetic distances of more than20% may 
providelimited information regarding dating”. Therefore, we only used 
molecular rate dating to extract node ages for branching eventsat the 
tips of the trees, atthespecies or population level (oldest colonization 
time in our datasetis 15.3 million years, but most are much younger). 
Rates of evolution were obtained from the literature and varied between 
different markers and taxonomic group (Supplementary Table 4). We 
applied the avian mitochondrial rates estimated using cytochromeb 
froma previous study” (but see ‘Sensitivity to alternative divergence 
times and tree topologies’ for different rates). 

Weapplieda Bayesian uncorrelated log-normal relaxed clock model. 
For each analysis, we ran two independent chains of between 10 and 
40 million generations, with a birth-death tree prior. We assessed 
convergence of chains and appropriate burn-ins with Tracer, combined 
runs using LogCombiner and produced maximum clade credibility 
trees with mean node heights in Tree Annotator. We produced total 
of 80 maximum clade credibility trees. 

For Il groups (Extended Data Table 4), well-sampled and rigorously 
dated phylogenies were already available from recent publications, 
all of which conducted Bayesian divergence dating using a variety of 
calibration methods, including fossils and molecular rates. We obtained 
maximum clade credibility trees from these studies from online reposi- 
tories or directly from the authors (Extended Data Table 4). 


Colonization and branching times 

Thenodes selected in the dated trees for estimates of colonizationand 
branching times are given for each taxonin Supplementary Data 1. Our 
node selection approach was as follows. For cases in which samples 


representing species or populations from archipelagos formed amono- 
phyletic clade consistingexclusively of archipelago individuals, weused 
the stem age of this clade as colonization time. For casesin which only 
oneindividual of the archipelago was sampled, we used the length of 
the tip leading to that individual, which is equivalent to the stem age. 
For casesinwhich the archipelago individuals were embeddedinaclade 
containing mainland individuals of the same species-thatis, paraphyly 
or polyphyly—we assumed (based on morphological characteristics) 
thatthisis due toincomplete lineage sorting of the insular and mainland 
lineages, and we therefore used the most recent common ancestor 
node of the archipelago individuals, or the crown node when the most 
recent common ancestor node coincides with the crown, For these later 
cases, usingthe stem would most likely have been an overestimation of 
the colonization time, as we assume that colonization happens from 
the mainland to the archipelago. For such cases, we applied the ages 
using the ‘MaxAge’ option in DAISIE, which integrates over the pos- 
sible colonization times between the present and the upper bound. A 
robustness test of our results to node choice is givenin ‘Sensitivity to 
alternative branching times and tree topologies. 

Fora total of 19 endemictaxawe could not obtain sequences, butwe 
could allocate them toa specific island clade (for example, Hawaiian 
honeycreepers and solitaires). These were addedas missing species to 
that clade. For 96non-endemic taxa we could not obtain sequences of 
individuals from the archipelago, but we could obtain sequences from 
the same species from different regions. For these cases, we used the 
crown or the stem age of the species as an upper bound for the age of 
the colonization event, using the ‘Non_endemic MaxAge’ option in 
DAISIE. Finally, for124 taxa (20.8%) no sequences of individualsfrom the 
archipelago were available in GenBank and wewere not able to obtain 
samples for sequencing from the species or from close relatives. We 
assumed these cases constituted independent colonizationsthatcould 
have taken place any time since the origin of the archipelago and the 
present, and applied the ‘Non_endemic MaxAge' and ‘Endemic Max- 
Age’ options in DAISIE with a maximum age equal to the archipelago 
age. DAISIE makes use of the information described above; further 
information has been described previously. 


Global dataset characteristics 

Data points from taxa of the same archipelago were assembled into 41 
archipelago-specific datasets. These 41 datasets were in turn assem- 
bled intoa single dataset (D1), which was analysed with DAISIE (D1 
DAISIE R object, available in Mendeley Data https://doi.org/10.17632/ 
sy58zbv3s2.2). This dataset (Supplementary Data 1) has a total of 596 
taxa (independent colonization events plus species within radiations), 
covering 491 species from 203 different genera and 8 orders. All taxa 
were included in the analyses: not only those which we sampled in 
phylogenies, butalso those for which sequences or phylogenies could 
not be obtained and which were included following the approaches 
described in Colonization and branching time’. Asummary of diversity 
and sampling per archipelago is provided in Extended Data Table 1. 


Sampling completeness 

Intotal, we produced new sequences from252newindividuals, compris- 
ing 90 different species from 45 different genera, covering anadditional 
0 colonization events that had not been sampled (thatis, populations 
from islands for which the species had not been sampled before). For 
atleast 12 of these 90 species, we found no previous sequencesin Gen- 
Bank, includingislandendemics from Comoros, Galdpagos, Rodrigues 
and Sao Tomé (Supplementary Table 5). The new sequences from 252 
individuals increase the molecular sampling for extant colonization 
events from 60% (223 out of 373) to 89% (332 out of 373). Ifweinclude 
historically extinct colonizations, weincreased the molecular sampling 
from the existing 54% (269 out of 502) of colonization events to 75% 
(379 out of 502). We also substantially increased molecular sampling 
of continental relatives, adding 78 new individuals from the continent 


orislands surrounding our archipelagos, covering 43 different species. 
The percentage of taxa sampled in phylogenies varied widely between 
archipelagos (Fig. 1 and Extended Data Table 1). For 8 archipelagos 
(Bermuda, Fernando de Noronha, Pitcairn, Rapa Nui, Rodrigues, Saint 
Helena, Society Islands and Tonga) less than 50% of the species were 
sampledin phylogenies, and thus the majority of the species for these 
island groups were added with maximum ages and endemicity status. 
For 3 archipelagos, whichaccounted for more than athird ofthe total 
species, over 90% of the species were sampled in phylogenies. 


DAISIE 
We used the method DAISIE” to estimate rates of species accumula- 
tion (colonization, speciation and extinction) on the archipelagos. The 
model assumes that after the origin of an island, species can colonize 
froma mainland pool. Onceaspecies hascolonized, itmay remain simi- 
lar to its mainland ancestor (non-endemic species), become endemic 
through anagenetic speciation (new endemic speciesis formed without 
lineage splitting on the island), split into new species via cladogenetic 
speciation and/or go extinct. A carrying capacity (thatis, the maximum 
number of species each colonist lineage can attain) isimplemented, 
such that rates of cladogenesis and colonization decline with increas- 
ing number of speciesin the colonizingclade. 

The only effect of anagenesis in DAISIE is that the colonizing spe- 
cies becomes endemic, because further anagenesis events on the 
endemic species do not leave a signature in the data. However, the 
rate of anagenesisis not systematically underestimated. Suppose the 
rate was higher; itwould then follow that colonizing species would also 
become endemic faster,and we would see moreendemicspecies. Thus, 
the number of endemicspecies determines the rate of anagenesis, and 
DAISIE estimates the true rate of anagenesis without systematic bias. 
Further anagenesis events do nothaveaneffect on the statevariables, 
and hence donotenter the equations anymore. 

Inits parameterization of extinction, DAISIE accounts for the fact 
thatthere may have been several lineages that were present on theinsu- 
lar systemin the past but that went completely extinct due to natural 
causes, leaving no extant descendants. Simulations have shown that 
the rate of natural extinction is usually well estimated in DAISIE (see 
“Measuring precision and accuracy’ anda previously published study®). 
Studies on phylogenies of single clades suggest that phylogenetic data 
on only extant species provide lessinformation on extinction than on 
speciation (or rather diversification rates). However, there isinforma- 
tion contentin such data®, especially when diversification dynamics 
are dependent on diversity**. Moreover, here we usecolonization times 
in addition to phylogenetic branching times to estimate extinction 
rates, and weare estimating hyperparameters of the theoretically and 
empirically suggested relationship of extinction witharea. Finally, we 
use data from many independent colonizations, which increases the 
power of our statistical method considerably and decreases the bias, 
as maximum likelihood is known to asymptotically provide unbiased 
estimates. 


Estimating global hyperparameters 
Ouraimisto examine the dependencies of the parameters that govern 
speciesassembly (colonization, extinction, cladogenesis, anagenesis 
(CES rates) and carrying capacity) on the features of archipelagos (area 
and isolation). We developed amethod toestimate global hyperparam- 
eters that control therelationship between two key archipelago features 
(areaandisolation) andarchipelago-specific local) CES rates. Onecan 
estimate directly from the global dataset the shape of the relationship 
between isolationand colonization rate that maximizes the likelihood 
for the entire dataset. 

Our method finds the hyperparameters that maximize the likelihood 
of the entire dataset, thats, the sum of the log-likelihoods for each 
archipelago. We tested thehypothesis that area and distance fromthe 
nearest mainland havean effecton CES rates (cladogenesis, anagenesis, 
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extinction and colonization). If an effect was identified, we also esti- 
mated the scaling of the effect. We developed asset of a priori models 
inwhich the CES rates are affected by archipelago featuresasis often 
assumedin theisland biogeography literature (Supplementary Table). 
For thea priori models, we considered that CES rates are determined by 
apower function of area or distance. In the power function, par=parol" 
where par isthe CES rate (for example, local rate of colonization), paryis 
the initial value of the biogeographical rate (for example, global initial 
rate of colonization), fis the physical variable (area or distance) and h 
is the strength of the relationship. The exponent h can be negative or 
positive depending on the nature of the relationship. par, and hare 
the hyperparameters. Ifthe exponent his estimated aszero, thereis no 
relationship between /and the parameter. By including or excludingh 
from thedifferentrelationships, wecan compare differentmodels with 
the effects switched on or off (Supplementary Table 1; for example, 
inmodel MLall relationships are estimated, whereas in model M2 the 
exponent of the relationship between anagenesis and distanceis fixed 
to zero and thus anagenesis does not vary with distance). 

In addition to the a priori models, we considered a set of post hoc 
models with alternative shapes of relationships. We fitted two types of 
post hocmodels: power modelsand sigmoid models (Supplementary 
Table 1). In the post hoc power models, we modelled all parameters as 
intheapriori models, except for cladogenesis: we allowed cladogen- 
esis to be dependent on both area and distance. The reason for this 
is that we found that the predicted number of cladogenetic species 
under thea priorimodels werenotas high as observed, so we examined 
whether including a positive effect of distance would improve the fi 
Wedescribed the relationship between area, distance and cladogenesis 
using different functions—one model in which there isan additive effect 
of area and distance (MIS); and three models (M16, M17 and M18) in 
whichthe effect of area and distance is interactive. In addition, wefitted 
amodel identical to M16 butwith one parameter ess (M19). Thereason 
for this was that this parameter (y) was estimated to be zero in MI6. 

Inthe posthoc sigmoid models, we allowed the relationshipbetween 
distanceand agiven parameter to follow a sigmoid rather apower func: 
tion. The rationale for this was that we wanted to investigate whether, 
for birds, the effect of distance on a parameter only starts to operate 
aftera certain distance from the mainland, asbelow certain geographi- 
cal distances archipelagos are within easy reach for many bird species 
by light, so that at these distances the island behavesalmostas part of 
the mainland fromabird’s perspective. We fitted nine different sigmoid 
models (Supplementary Table 1), allowing cladogenesis, anagenesis 
and colonization to vary with distance following a sigmoid function. 
The sigmoid function that we used hasan additional parameter in com- 
parison to power functions. 

In total, we fitted 28 candidate models (14 a priori, 14 post hoc) to 
the global dataset using maximum likelihood. We fitted each model 
using 20 initial sets of random starting parameters to reduce the risk 
of being trapped in local likelihood suboptima. We used the age of each 
archipelago (Extended Data Table 1) as the maximum age for coloniza- 
tion. We assumed a global mainland species pool M of 1,000 species. 
Theproduct of Mand the intrinsic rate of colonization (y,) is constant 
as longasMis large enough (larger than thenumber of island species), 
and thus the chosen value of M does not affect the results. 

To decide which information criterion to use to select between dif- 
ferent models, we compared the performance of the BIC and the Akaike 
information criterion (AIC). We simulated 1,000 datasets each with 
models M9 and M19 and then fitted the M9, M14, M17 and M19 models, 
to each of these datasets using two initial sets of starting parameters 
for each optimization. We found that for datasets simulated using M9 
an incorrect model was preferred using AIC in 10.4% of cases, butonly 
in 0.11% of cases when using BIC. For datasets simulated using M19 an 
incorrect model was preferred 12.8% of cases using AIC and 11.1% of 
cases using BIC. We thus compared models using BIC, as this model 
has lower error rates. 


Analternative approach to estimating hyperparameters would be 
to calculate CES rates and their uncertainty independently for each 
archipelago and to then conducta meta-analysis of the resulting data, 
includingarchipelago area andisolation as predictors. However, errors 
inparameter estimates will vary, particularly because some archipela- 
gos have small sample sizes (only a few extant colonization events, or 
none at all; for example, Chagos) and are thus much less informative 
aboutunderlyingprocess®. Thus, maximizingthe likelihood of all data- 
sets together by estimating the hyperparameters (whichis precisely our 
aim) is preferable. For completeness, we present CES rates estimated 
independently for eacharchipelago in Supplementary Table 6, exclud- 
ing archipelagos with fewer than 6 species and for which we sampled 
less than 60% of the species in the phylogenies. However, as argued 
above we do not advocate using these parameter estimates for further 
analyses because the number of taxa for some of these archipelagos 
is still low and by excluding archipelagos with fewer than six taxa, we 
cannot capture the lower part of the relationship between area or iso- 
lation and CES rates. 

All DAISIE analyses were run using parallel computation on the 
high-performance computer clusters of the University of Groningen 
(Peregrine cluster) and the Museum fiir Naturkunde Berlin. The new 
version of the R package DAISIE is available on GitHub. 


Randomization analysis 

We conducted a randomization analysis to evaluate whether there is 
significantsignal ofa relationship between area and distance and local 
CES rates in our global dataset. We produced 1,000 datasets with the 
same phylogenetic data and archipelago ages as the global dataset, 
but randomly reshuffled archipelago area and Dain each dataset. We 
then fitted the best post hoc model to each of these 1,000 randomized 
datasets. If the maximum-likelihood estimates of exponent hyperpa- 
rameters (that is, the strength of the relationship) in the randomized 
datasets werenon-zero, this would indicate that the methodis finding 
evidence for a relationship eveniif there is none. If, onthe other hand, 
non-zero hyperparameters are estimated in the real data but notin the 
randomized datasets, this would mean that thereis information inthe 
data regarding the putative relationships. 

The randomization analysis showed that in global datasets with 
reshuffled areas and distances the exponent hyperparameters are 
estimated aszeroin most cases, whereasin the empirical global dataset 
they are not (Extended Data Fig. 3). 


Aposteriori simulations 

We simulated 1,000 phylogenetic global datasets (41 archipelagos each) 
with the maximum-likelihood hyperparameters of the best a priori 
(M14) and post hocmodels (M19). Wefirstcalculated thelocal CES rates 
for each archipelago based on their areaand isolationand the hyperpa- 
rameters for themodel, and then used these CES rates asthe parameters 
for thesimulationsusing the DAISIE R package. Thesimulated data were 
used to measure bias and accuracy of the method, goodness of fit and 
the ability of our method to recover observed island biogeographical 
diversity patterns (see‘Measuring precision and accuracy of method’ 
and ‘Measuring goodness of fit’ sections). 


Measuring precision and accuracy of method 

DAISIE estimates the CES rates with high precision and little bias. 
We conducted parametric bootstrap analyses to assess whether the 
ability to estimate hyperparameters from global datasets is also good 
(Extended Data Fig. 2) and to obtain confidence intervals on param- 
eter estimates (Extended Data Table 5). We used DAISIE to estimate 
hyperparameters from the M14 and M19 simulated datasets (1,000 
replicates each). We measured precision and accuracy by comparing 
the distribution of parameters estimated from the 1,000 simulated 
dataset with the real parameters used to simulate the same datasets. 
Tocheck whether maximumlikelihood optimizations of the simulated 


global datasets converge to thesamepointin parameter space, we first 
performeda testona subset of thesimulated data. Weran optimizations 
with 10 random sets of initial starting values for each of 10 simulated 
datasets. All optimizations converged to the same likelihood anda 
very similar hyperparameter set; therefore, we are confident that we 
found theglobal optimum for each simulated global dataset, even for 
models with many parameters. 


Measuring goodness of fit 

We measured how well the preferred models fitted the data using dif- 
ferent approaches. First, we examined whether our models successfully 
reproduced the diversity patterns found on individual archipelagos. 
We calculated the total number of species, cladogenetic species and 
independent colonizations in each archipelago for each of the 1,000 
simulated datasets. We then plotted these metrics versus the observed 
values in the empirical data (Fig.3 and Extended Data Fig. 4). Our pre- 
ferred models have a slight tendency to overpredict species richness 
when there area fewspecies andunderpredictitwhen there are many. 
We do not havea clear explanation for this. This slight deviation does 
not seem to be due to an additional dependence on area or distance, 
soan explanation should be sought in other factors that we did not 
model. We note that the fact that all three plots show this tendency 
rather than only oneis to be expected because the three metrics of 
species richness are not entirely independent, with total species rich- 
ness being the sum of the other two. 

‘Second, we examined whether the models successfully predict the 
empirical relationships between area, distance and diversity metrics 
(total species, cladogenetic species, and number of independent 
colonizations). We fitted generalized linear models for each diversity 
metric, with quasi-Poisson family errors and log area (or distance) 
as predictors. We then repeated this across 1,000 independent sets 
of simulated data for the 41 archipelagos and compared the mean of 
slopes and intercepts for archipelago area and archipelago isolation 
tothe equivalent estimates for the empirical data (Fig. 4). 

Third, we estimated the pseudo-R’ of the best model (M19) as a 
measure of the explanatory power of the model. We simulated two 
independent sets of 10,000 global datasets under M19 model (set 1 
andset 2). Wecalculated the mean total number of species, number of 
cladogenetic species and colonizations for each archipelago acrossall 
datasets from set 1. For each diversity metric, we calculateda pseudo-R? 
(pseudo-R’ observed) for which the total sum of squares was obtained 
from the empirical dataand the residual sum of squares was calculated 
as the difference between empirical values and expected values (that 
is, the simulation means). As the modelis inherently stochastic, even 
if the model is an accurate and complete reflection of the underlying 
processes then the pseudo-R? would tend tobe <I. Toestimate the dis- 
tribution of pseudo-R? expected under the model, wetreated the set-2 
simulations as data and estimated the pseudo-R’ for each (pseudo-R? 
simulated). We then calculated the ratio of the pseudo-R*-observed 
values over the 10,000 pseudo-R*-simulated values. Aratio approaching 
would indicate thatthemodelis explainingthe observed dataaswell as 
theaverage dataset simulated under this process (Extended Data Fig. 5). 


Sensitivity to alternative divergence times and tree topologies 

Despite having sampled many new individuals from islands world- 
wide, given the wide geographical scale of our study we still rely on 
sequence data for thousands of individuals submitted to GenBank 
over the years. Whenever multi-loci analyses including our focal taxa 
were available we used them; however, these are rare (Extended Data 
Table 4). Therefore, the majority of our phylogenies are based ona 
small number of genes, and most on a single gene, cytochrome b, 
which is the most widely sequenced mitochondrial marker in birds. 
Although some studies on island birds have shown that colonization 
and diversification times derived from mitochondrial trees often do 
not differ much from those obtained using multipleloci”, itis possible 


that in some cases the scaling and topologies of the trees might have 
been more accurate had we used multiple loci®. This is particularly 
relevant for recentisland colonists, given incomplete lineagesorting”. 
An additional shortcoming of relying on published sequence data is 
that many of our DNA alignments often have substantial sections with 
missing data (for example, because only one small section of the gene 
could be sequenced and was uploaded to GenBank), which has been 
shown to lead to biasesin branch lengthsand topology”. While future 
studies using phylogenomic approaches may address these issues, 
obtaining tissue samples for all of these taxa will remain an obstacle 
foralongtime. 

Although DAISIE doesnot directly use topological information (only 
divergence times are used), itis possible that the true topology for a 
clade may differ from that of the gene tree that we have estimated and 
this could havean effect on our resultsby (1) affecting colonization and 
branching times (addressed in the paragraph below); or (2) by altering 
the number of colonization events. Alternative topologies may have 
led to an increase or decrease in colonization events—for instance, 
some species that appear to have colonized an archipelago only once 
may have colonized multiple times and if these re-colonizations are 
recent they may go undetected when using one or few loci. As with 
any phylogenetic study, we cannot rule out this possibility, but we 
assume that recent re-colonization of thearchipelagos in our dataset 
bythesametaxonis rare, as these areall oceanic andisolated. For archi- 
pelagolineages with cladogenesis (26 out of S02 lineages), alternative 
topologies could includenon-monophyly ofisland radiations, with the 
corollary being that they would be the result of multiple colonization 
events. However, this seems improbable for these isolated and well- 
studied radiations, for which morphological evidence (for example, 
HBW”) is consistent with their monophyly as supported by existing 
molecular data. 

Regarding scaling of divergence times, we assessed how uncertainty 
in our estimated node ages could influence our results by running an 
analysis of 100 datasets. For each dataset we sampled the node ages 
(thatis, colonization and branching times) at random froma uniform 
distribution centred on theposterior mean for that node in the BEAST 
tree and extending twice the length of the highest posterior density 
(HPD) interval. For example, for a node with a 95% HPD interval of 
2-3 million years in our trees, the uniform distribution was set to 
between1.5.and3.5 million years. The HPD interval will capture uncer- 
tainty under the selected phylogenetic and substitution models forthe 
loci that we used, butwe conductour sensitivity analysis overa broader 
interval to accommodate the potential that the selected models and 
gene trees are inadequate. For cases in which using this approach meant 
that the lower bound of the uniform distribution was less than 0, we 
assigned a value of 0.00001 million years to the lower bound. We fitted 
the9 best models tothe 100 datasets using initial starting parameters 
for each model (total 4,500 optimizations). We found that parameter 
estimates across the 100 datasets did not differ strongly from thosein 
the main dataset (Supplementary Table 7). Notably, model selection 
was unaffected, with the M19 model being selected forall 100 datasets. 
Thisisbecausea lot of the information used for model selection is com- 
ing from the other sources of information that DAISIE uses (island age, 
number of species and endemicity status) rather than colonization or 
branching times. 

The maximumikelihood parameters of the M19 model and the 
resulting area and isolation dependencies for datasets D1 to D6 (dis- 
cussed below) are shown in Extended Data Fig. 6 and the DAISIE R 
objects including these alternative datasets are availablein Mendeley 
Data (https://doi.org/10.17632/sy$8zbv3s2.2). 

To account for uncertainty in the rates of molecular evolution, 
we repeated all BEAST dating analyses for markers that were not 
cytochrome using (1) the previously published cytochrome b rate” 
(dataset D1, equal to main dataset) and (2) previously estimated marker- 
specific rates", which have also been widely used in the literature 
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(dataset D2). Although the trees dated using themarker-specific rates 
provide younger ages, we found that the DAISIE results were very similar 
using either approach (same model preferred and similar parameters). 
Therefore, in the main text we only discuss the results of analyses of D1, 
thatis, applying the cytochrome b rate to all markers. 

For some taxa, we did not use the stem age as the estimate of colo- 
nization time, and instead used alternative nodes (see ‘Colonization 
and branching times’). To test whether our choice of nodes affects 
‘ourmain conclusions, we recoded all such taxa by extracting the stem 
ages and used these ages as an upper bound for colonization (DAISIE 
MaxAge option). We fitted all 28 models to this new dataset (D3) and 
found that the M19 modelis preferred and that the parameters and 
area or isolation relationships vary only slightly from those ofthe main 
analysis. We therefore conclude that our results are robustto thenode 
selection approach. 

Ifextinction has been high on the mainland, or ifwe failed to sample 
the closest relatives of the island taxa, this could lead to an overestima- 
tion of colonization times when using the stem ageas the precise time of 
colonization. To investigate how this could have influenced our results, 
we ran analyses of datasets in which we allowed colonization to have 
happened at any time sincethe stem age (thatis, thetimeof divergence 
fromthenearest relative of the taxon on the mainland). For this we used 
the DAISIE options Endemic MaxAge or NonEndemic MaxAge, which 
integrate over all possible ages between the given maximum age and 
the present (or the first branching event within the archipelago for 
cases in which cladogenesis has occurred). We repeated this analysis 
coding all stem ages as maximum ages (D4), or coding only the 25% 
older stem ages as maximum ages (to account for the fact that older 
stems have the potential to have more bias) (DS). Wealso ran analyses 
on 100 datasets (D6) for which we assigned precise younger ages by 
randomly selectinga value between the stem age and the present (or 
crown age for cladogenetic groups). Forall of these datasets (D4-D6), 
we found that thesame model (M19) was preferred, but the initial values 
of the biogeographical rates (cladogenesis, extinction, colonization 
and anagenesis) were estimated to be higher than in the main dataset. 
Notably, the exponent hyperparameters were similar to those in the 
main dataset, meaning that the shape of the relationships between 
parametersand area or isolation is not much affected (Extended Data 
Fig. 6). The only exceptionis perhapsanagenesis, for which the relation- 
ships varied more markedly—with isolated islands achieving very high 
rates for this parameter—but still agreeing with our main conclusions. 
Anagenesis is in general the most difficult parameter to estimate”. 
Thus, our conclusions are robust to the colonization times potentially 
being younger than those in our main dataset. 


Sensitivity to archipelago selection and isolation metrics 
Theresults of the following sensitivity analyses are presented in Sup- 
plementary Data3 and the DAISIER objects thatincludethese alterna- 
tive datasets are available in Mendeley Data (https://doi.org/10.17632/ 
sy58zbv3s2.2). 

To test whether the inclusion of both true archipelagos and single 
islands in our dataset could affect the results, we repeated analyses 
excluding single island units and found that the same model was pre- 
ferred. The estimated initial rate of cladogenesis (A°») ishigher if we 
exclude single islands, but this parameter is not different from a dis- 
tribution of parameters estimated from datasets generated using a 
stratified-random sampling of both archipelagos and singleislands. 

Alternative isolation metrics to D,, have been shown to explain 
varying and often higher amounts of variation in species richness on 
islands" We tested two alternative metrics: distance to the nearest 
larger or equivalent-sized landmass (D,), and the mean between D,, 
and D, (metrics given in Supplementary Data 2). We found that the 
same DAISIEmodel with very similar parameters was preferred in both 
cases, and we therefore used only the D,, metric, as thisis more similar 
tothe original model of MacArthur and Wilson. 


‘The Mascarenes (Mauritius Island, Reunion and Rodrigues) are often 
treated asa single biogeographical unitin analyses. We chose to analyse 
themas independent units because (1) the distance between islandsis 
much greater than our threshold for archipelago definition (more than 
500 km between Mauritius Island and Rodrigues; more than 170 km 
between Reunion and Mauritius Island); (2) only two species of our 
target group are shared between the islands (Terpsiphone bourbon- 
nensis is found in Mauritius Island and Reunion; and Psitcacula eques 
isfoundin Mauritius sland and extirpated from Reunion), suggesting 
low connectivity; (3) although thereare three clades whose branching 
events took place within the Mascarenes (Coracina, Pezophaps and 
Raphus, and Zosterops), the remaining species result from independ- 
ent colonizations, suggesting that the three islands behave mostly as 
three different biogeographical units. We nevertheless ranan analysis 
treating the islands asa single archipelagic unit and found that the 
same model was preferred and with similar parameter estimates, and 
we therefore discuss only the results treating themas separate. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 


New sequence data produced for this study havebeen depositedin Gen- 
Bank with the accession codes: MH307408-MH307656. The following 
datasets have been deposited in Mendeley: DNA alignments (https:// 
doi.org/10.17632/vf95364vx6.1), new phylogenetic trees produced 
for this study (https://doi.org/10.17632/péhmSw8s3b.2), and DAISIE 
Robjects (https://doi.org/10.17632/syS8zbv3s2.2). The Il previously 
published trees are available upon request. 


Code availability 
The custom computer code used for this study is freely availableinthe 
DAISIER package (https://github.com/rsetienne/DAISIE). 
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Extended Data Fig. 2 | Bootstrap precision estimates ofthe parametersof 
the M19 model. Parametric bootstrap analysis fitting the M19 model to 
1,000 global datasets simulated with maximumlikelihood parametersof the 
M19 model. Plotsare frequency histograms of estimated parameters. Black 
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linesshow the median estimated values acrossallsimulationsand the blue lines 
the simulated values. Dashed lines show 2.5-97,5 percentiles. Parametersare 
explained in Supplementary Table. Bootstrap parameter estimates for the 
MI4 modelare shownin Extended Data Table 5. 
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estimates for the M19 model, Red arrows show the estimated parameter from 
the real data, Inmostcases, the hyperparameters describing the exponentof 
the power models (x, a, Band d,)are estimated as zero inthe reshuffled 
datasets, which isnot the case in the real data (red). Parametersare explained in 


Extended Data Fig.3| Randomization analysis of the M19 model. 
Distribution of global hyperparameters estimated from each of 1,000 datasets 
with the same phylogeneticdataas our main global dataset but randomly 
reshuffling archipelago area andisolation among the 41 archipelagos in the 
dataset. Grey histograms show DAISIE maximum-likelihood parameter Supplementary Table 1. 
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Parameter symbols are described in Supplementary Table 1. b, Estimated 


Extended Data Table '1| Archipelago characteristics and references for island geological ages 


‘Archipelago ‘Area Distance | Age(Ma) Reference Numberof Species Colonisations 
(km?) nearest age speciesin sampled in sampled in 
mainland focal group phylogenies —_ phylogenies 
(km) (percentage) _(percentage) 
‘Aldabra Group 180 261 1 iy 12 11 (91.67) 11 (91.67) 
Ascension 91 1537 1 e 0 NA (NA) NA (NA) 
Azores 2387 1365 63 « 7 13 (76.47) 13 (76.47) 
Bermuda 53 1040 2 ot 5 2 (40) 2 (40) 
Canary Islands 7493 (96 24 « 50 44 (88) 41 (87.28) 
Cape Verde 4033570 158 « 10 10 (100) 10 (100) 
Chagos 56.13 1510 0.0065 0) NA (NA) NA (NA) 
Chatham 966 650 3 a 14 13 (92.86) 13 (92.86) 
Christmas Island 135 345 10 ® 4 2 (60) 2 (60) 
Cocos (Costa Rica) 24 491 24 » 4 2 (50) 2 (50) 
Cocos (Keeling) 14.2 1054 0.003 n ° NA (NA) NA (NA) 
Comoros 2033297 15 2 4 40 (97.56) 28 (96.55) 
Fernando de 26 360 33 * 3 1 (3.33) 1 (83.33) 
Noronha 
Galapagos 7880 928 4 ™ 27 26 (96.3) 8 (100) 
Gough 91 2582 4 % 1 1 (100) 1 (100) 
Guadalupe 244 254 7 ™ 1" 10 (90.91) 10 (90.91) 
Hawaii 16624. 3670 29.8 % 51 31 (60.78) 6 (75) 
Juan Fernandez 5067 600 58 7 6 6 (100) 5 (100) 
Lord Howe 14.55 571 69 * 1" 9 (81.82) 9 (81.82) 
Madeira 798 600 18.8 » 19 17 (89.47) 17 (89.47) 
Marianas (with 852 1800 15 ¢ 19 11 (67.89) 11 (57.89) 
Guam) 
Marquesas 1063 4750 55 °0 19 12 (63.16) 7 (60) 
Mauritius Is. 1865 867 8.9 8 15 9 (60) 9 (60) 
New Caledonia 18576 1300 37 2 46 38 (82.61) 37 (82.22) 
Niue 261.46 2340 2 cy 5 3 (60) 3 (60) 
Norfolk 346 © 730 3.05 cy 14 11 (78.57) 11 (78.57) 
Ogasawara 65 827 5 6 10 5 (50) 5 (50) 
Palau 488 815 20.1 * 16 10 (62.5) 10 (62.5) 
Pitcairn 428 5015 11 o 8 3 (37.5) 2 (28.57) 
Rapa Nui 1636 © 3519 0.78 co 2 0 (0) 0(0) 
Reunion 2512 700 5 ° 13 9 (69.23) 9 (69.23) 
Rodrigues 109 1440 15 so 9 4 (44.44) 4 (44.44) 
Saint Helena 123.28 1856 145 ” 3 0(0) 0 (0) 
Samoa 3041 2730 13.5 we 22 16 (72.73) 16 (72.73) 
‘Sao Tomé and 964 219 30 * 44 44 (100) 37 (100) 
Principe 
Selvagens 273 373 29 ™ 1 1 (100) 1 (100) 
Seychelles Inner 242.68 1048 64 oI 12 11 (91.67) 11 (91.67) 
Society Islands 1577.8 3700 43 & 18 7 (38.89) 7 (98.89) 
Socorro 192 457 35 ee 7 7 (100) 7 (100) 
Tonga 344.4 1850 4 ” 23 10 (43.48) 10 (43.48) 
Tristan da Cunha 115.4 2770 18 so 4 4 (100) 2 (100) 


Island ages are from previously published studies" 


More data are provided in Supplementary Data 2.For archipelagos closer to Madagascar, New Guinea or New Zealand than to the continent, we use those islands a the mainland. 
“A previous study proposed an age of 0.125 milion years, but we used an older age (see Methods). 

‘At least 2 milion years (P Hearty, personal communication). 

'R Stern andIM. K Reagan, personal communication, 
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Extended Data Table 2 | Primer sequences used in this study 


Primer name Sequence Reference 
New primers designed 

L-cytB_bird_0 ‘TCAACRACTCCCTAATYGACCT This paper 
H-cytB_bird 2 AGRAYTACTCCTGTGTTTCARGTYTC. This paper 
L-cytB_bird_2 GARACYTGAAACACAGGAGTARTYCT This paper 
H-cytB_bird_3 TAGKGGGTTGTTTGAGCCTGWTTCGTG This paper 
L-cytB_bird_3 (CACGAAWCAGGCTCAAACAACCC. This paper 
H-cytB_bird 4 GGAGTAGTADGGGTGAAATGGRATTTT This paper 
H-cytB_bird 5 GGGTGTTCTACTGGTTGGCTKCC This paper 
L-cytB_bird_5 (CCMCTCTCACAAAYCCTATTCTGA This paper 
L-cyt8_human TGAAACTTCGGATCCCTACTA This paper 


Published primers 
L481 
14995 
Lisa08 
H18767 
15917 
H16065 
Leyt®_Passer 
HoytB_Passer 
Leyt6-Mot 
HoytB-Mot 


AAAAAGCTTCCATCCAACATCTCAGCATGATGAAA 
GCCCCATCCAACATCTCAGCATGATGAAACTTCCG, 


GGC TAT GTC CTC CCA TGA GGC CAA AT 


ATGAAGGGATGTTCTACTGGTTG 


TAGTTGGCCAATGATGATGAATGGGTGTTCTACTGGTT 


GAGTCTTCAGTCTCTGGTTTACAAGAC 
(CACAGGCCTAATTAAAGCCTACCT 
TTGARAATGCCAGCTITGGGAG 
(CCAAATYGTTACAGGMCTCCTG 
GGTGAATGAGGCTAGTTGCCCA 


Primer sequences were designed for this study or are from previously published studies 


Extended Data Table 3 | The 80 alignments used in the phylogenetic analyses 


Taxonone prom Wolecular Wain source Taxonomic group Molecular Wain source 
markers) of sequences markers) of sequences 
Acrocephalus cyt-b - ‘Moho cyt-b € 
Alaudidae (family) cytb . Monn: ob 3 
Alopeccenas/ Galicoumba D2 5 Motacia one ¢ 
Anairetes cyt - Myadesiee, oh 
Anthus ote : Mysore ote+noe 
ws = . Myiarchus ctb+ND2 1 
Nitta 1 
‘Bucanetes cyte - wt ids 
Onychognattus Noe 2 
Bumtings Nesospiza/ Rowe cj-b 3 
Carduots ote 2 
Passer! Poona ote Q 
Chasiompis Elepio Noe : 
Chaunoproctus ND2 . eas) re ca 
Cinnyris notata ATPG 109 Rfopcacopes’ oe 1 
fo " 
Cisticola cyt-b - Oe cae 
Pomarea ote _ 
Cherhynenus ote 
rina oe : 
Cocoyus one ¥ 
S " 
Colaptes od 5 Progr oy 
Paitacitores ote : 
Columtitormes (re) ote : 
Pyhocorax " 5 
Copsychus ob 2 * 
oytb Sa Pyrtula cyt-b - 
Regus 1 
Crthagra / Serinus cytb A ‘ ae 
Saxiola ote = 
‘Cuculiformes (order) cyt-b * 
Sephancies 1 c 
Cyanistes cyt - = 
Cyanolanius cyt-b - Riek ath 
Seiophage potechia NO2+ATPS+ 1 
encrocopos ote : cr 
Dicrurus cytb - ain ob - 
‘Dumetella ND2 - nite Sieg. 4 
eee ome 7 sums Noe : 
on sa : Sunbis(amiy) one z 
Estrida / Erythrura cyt : Syvia oy = 
Finches, Galapagos Cocos cyt-b + Multiple 1 Temaiphone) on? Ke 
Froud eS Troglodyes! Tmyomanes NDZ ‘0 
NDS Turdus cyt-b 
regis Noo : 
Y 1 3 
Fringilla cyt-b - ane, as 
vata e 
‘Haemorhous cyt - * 
Hawaiian Honeycreepers cyt-b a ‘eave Comm) re: i 
Be Zosterops(Indan, Nlante)—ofNDB 
Humbe ote 2 
Hypspotes Nos « 
Lamprotamis Noe . 
tanies one : 
Leptosomis one 2 
Lonctura at : 
Lovie ote : 
Merooca | Eopsaltia one 2 
Minus Noe 2 


‘Sequences were obtained from previous studies as indicated. Main source of sequences is GenBank or the new sequences produced for this study, except forthe cases noted inthe table, for 
Which matricwas directly obtained from a specific study". Details on molecular rates and molecular models applied to each alignment are provided in Supplementaty Table 4 
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Extended Data Table 4 | Previously published dated trees used 


Taxonomic group Source Molecular markers Calibration method 

Calypte 12 Multiple Molecular rate 

Cinclodes 13 3 mtDNA and 3 nuclear Biogeographical 

Corvides Hie Multiple Fossils 

Corvus moriorum ae Mitogenome Fossils 

Ducula ne ND2, COI, ND8, nuclear Secondary & 
Biogeography 

Junco a7: ND2 +CR+COI+ATP + Molecular rate 

nuclear 

Meliphagides (infraorder) 118 mtDNA and nuclear Fossils & Secondary 

Nesillas m8: ND2 Molecular rate 

Ptilinopus 120 ND2, COI, ND3, nuclear Secondary & 
Biogeography 

Pyrocephalus 121 cyt-b, ND2, nuclear Molecular rate 


Zosterops (Pacific) 


122 


cyt-b, ND2, ND3, ATPase 


Molecular rate 


Data are from previously published studies! 


Extended Data Table 5 | Bootstrap of M14 and M19 models 


Cladogenesis Extinction Colonisation Anagenesis 
Model___ Ae y be x Yo a Me A 
ag noes 0.26 7.88 0.18 31.90 0.25 0.05 042 
(0.01-0.07) (0.13 - 0.37) (1.48 - 2.45) (0.11 - 0.18) (28.86 - 89.64) (0.17 - 0.34) (0.01 - 0.16) (0.24 - 0.61) 
do cy Ho x yo a Mo B 
ug 0.08 0.027 1.95 0.18 67.26 0.29 0.058 0.38 
(0.022 0.077) (0.016-0.034)__(1.55-2.50) (0.12-0.18) _(96.35- 112.71) (021-097) __(0.02-0.19) __(0.21-0.57) 


Maximur-tkelihood estimates and 95% confidence intervals of the parameters ofthe two best models, Confidence intervals were obtained from the bootstrap analyses. Parameter symbols are 
‘explained in Supplementary Table 1 
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github.com/rsetienne/DAISIE). 
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All underlying data are available in the manuscript as supplementary data or on online databases. 
New sequence data has been uploaded to GenBank with accession numbers MH307408-MH307656. 


Other data types have been uploaded to Mendeley: 

DNA alignments: https://doi.org/10.17632/vf95364vx6.1 

New phylogenetic trees produced for this study: https://doi.org/10.17632/p6hmSw8s3b.2 
DAISIE R objects: https://doi.org/10.17632/syS8zbv3s2.2 
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Ecological, evolutionary & environmental sciences study design 
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Study description We produced phylogenies for island birds and developed a new method for estimating rates of speciation, colonisation and 
extinction from these islands and to relate them to island area and isolation on a global scale. A total of 596 bird taxa were included 
in the DAISIE analyses (including those taxa for which no phylogenetic data was available but which were present on the islands). The 
number of individuals sampled per taxon included in the phylogenetic analyses varied between 1 and 15. 


Research sample We did not conduct experiments. Our samples are bird specimens whose DNA was used for phylogenetic analyses. Our sampling 
focused on native resident terrestrial birds from 41 archipelagos (listed in Figure 1) and we considered only birds that colonise by 
chance events. We thus excluded marine and migratory species. We focused on songbird-like and pigeon-like birds, which constitute 
the majority of terrestrial (land-dwelling) birds on islands, We included only species from the same trophic level: we excluded aquatic 
birds, birds of prey, rails and nightjars. We also excluded introduced and vagrant species. Sex and age of the individuals is not 
relevant for the purposes of this study. 


The full lists of species and samples are given in Supplementary Data 1 and Supplementary Information Table 3 


Sampling strategy The sample size is the number of island colonisation plus island speciation events (569). We sampled all taxa of our focal group on 
each of 41 archipelagos. 


Data collection We sampled DNA from birds for sequencing, and compiled published sequence and phylogenetic data available. Bird samples were 
collected in the field by M.M, 8.H.W., S.M.C, J.C.I, CT. and LV. New sequences were produced by KH. and J.C.l. GenBank data and 
published phylogenetic trees were compiled by L.V.. New phylogenetic trees were produced by LV. . Data on island physical features 
‘were compiled from various published sources cited in Extended Data Table 1 


Timing and spatial scale | Field work was conducted between 1999 and 2017 on the island and continental regions specified in Supplementary Data Table 3. 
The spatial scale is global, as field locations were located in several continents and oceans. 


Data exclusions No data were excluded. 
Reproducibility The likelihood and simulation analyses conducted in this study can be reproduced using examples provided in the R package DAISIE. 


We provide examples of the cade and the same data used for running these analyses (e.g. see examples at the end of 
DAISIE_sim_global and DAISIE_MW_MIL functions in the DAISIE R package). 


Randomization This is not relevant as we did not conduct experiments. 
Blinding Blinding is not relevant, as we did not conduct experiments. 
Did the study involve field work? [XX] Yes No 


Field work, collection and transport 


Field conditions We conducted fieldwork on several islands worldwide in order to collect DNA samples from birds. The field conditions varied, but 
field work was only conducted when it was not raining to avoid hurting birds. The exact field conditions are not relevant because 
they do not impact the results. 


Location The 41 archipelagos/islands sampled are: Aldabra Group; Ascension; Azores; Bermuda; Canary Islands; Cape Verde; Chagos; 
Chatham; Christmas Island; Cocos (Costa Rica); Cocos (Keeling); Comoros; Fernando de Noronha; Galépagos; Gough; Guadalupe; 
Hawaii; Juan Fernandez; Lord Howe; Madeira; Marianas; Marquesas; Mauritius Isl; New Caledonia; Niue; Norfolk; Ogasawara; 
Palau; Pitcairn; Rapa Nui; Reunion; Rodrigues; Saint Helena; Samoa; SdoTomé e Principe; Selvagens; Seychelles (Inner); Society; 
Socorro; Tonga; Tristan da Cunha. Mainland sample locations: Angola, Andalucia (Spain), Cameroon, Equatorial Guinea, Gabon, 
Madagascar, Morocco. 


All relevant parameters (area, isolation, latitude, longitude, elevation, age) are listed in Supplementary Data 2. 


Access and import/export Information on collecting and export permits: 
- Angola - Biodiversity Research Protocol ISCED-Hufla and the South African National Biodiversity Institute (SANBI) (M.M.) 
- Cameroon - Limbe Botanical and Zoological Garden, Ministry of Scientific Research, Ministry of Forestry and Wildlife (M.M.). 
- Cape Verde - Cape Verde Agriculture and Environment Ministry; Ref.: 10/10 and 18/2015 (J.C... 
- Comoros - Centre National de Documentation et de Recherche Scientifique, 2000 (B.H.W.) 
- Equatorial Guinea - Universidad Nacional de Guinea Ecuatorial (M.M.) 


- Gabon - Centre National de la Recherche Cientifique (CENAREST), Station de Recherche de RET at !passa-Makokou, Parc de La 
Lekedi, CENAREST N°AR0053/12/MENESTFPRSCIS/CG/CST/CSAR 2012 (M.M.). 

- Madagascar - Ministére des Eaux et Forets, 2002 (8.H.W.) 

- Mauritius/Rodrigues - National Parks and Conservation Service (Republic of Mauritius), 1999 (B.H.W.) 

- Mayotte - Direction de l’Agriculture et de la Foret, 2000 (B.H.W.) 

= Morocco - Ref: 5061/08/HCEFLCD/DLCDPN/PRN/CFF // 14-2015 (.C..) 

- New Caledonia, Loyalty Islands - Direction du Développement Economique, 2 Dec 2011 6101-858/PR (LV.}; 31 Jan 2014 
6101-43/PR (SMC) 

- New Caledonia, South Province - Direction de I'Environmement Province Sud, 21 Jan 2014 Province Sud 3177-2013/ARR/DENV 
(SMC) 

- Portugal - Regional governments of 

1) Azores: 12/2016/DRA (1.C.1) 

2) Madeira: 02/2016 FAU MAD (J.C.L) 

- Sdo Tomé e Principe, Direc¢o Geral do Ambiente, Ministério das Obras Publicas, Infraestruturas, Recursos Naturais e Ambiente 
1999-present (no number) (M.M.) 

- Seychelles - Bureau of Standards and Ministry of Environment, Centre National de Documentation et de Recherche 
Scientifique, 2000 (B.H.W.) 

- Spain - Regional governments of 

1) Andalucia: SGYB/AF/FIRH/RE-35-36/13 .C.L) 

2) Canary Islands: Ref.: 443/02-10-2012 // Ref.: 2016/811 (J.C..) 

- Reunion ~ CRBPO (Centre de Recherches sur la Biologie des Populations dOiseaux, Muséum National d'Histoire Naturelle, 
Paris), #602, 2007 (C-T and B.H.W), 

- Museum samples: Department of Ornithology and Mammalogy of the California Academy of Sciences (Laura Wilkinson & 
Maureen Flannery); Natural History Museum at Tring (Mark Adams); Stuttgart State Museum of Natural History. 


Disturbance Minimal disturbance to sites - we used mist-nets, which are placed temporarily and cause minimal impact. 


Reporting for specific materials, systems and methods 


We require information from authors about some types af materials, experimental systems and methods used 
system or methad listed is relevant to your study. If you are not sure if list tem applies to your research, read the appropriate section before selecting a respanse. 


many studies. Here, indicate whether each material, 


Materials & experimental systems Methods 

n/a| Involved in the study n/a | Involved in the study 
Antibodies 1 chip-seq 
Eukaryotic cell lines 1 Flow cytometry 
Palaeontology [7 at-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Study did not involve laboratory animals. 


Wild animals Birds were caught in the field using mist-nets and immediately released in the same location after a blood sample was taken, No 
bird was injured, Killed or kept captive. 


The new samples collected for this study comprised 90 different species (252 individuals): Acrocephalus rodericanus; Agapornis 
pullarius; Alectraenas sganzini; Anabathmis hartlaubil; Anabathmis newtonil; Anabathmis reichenbachil; Chalcophaps indica; 
Chrysococeyx cupreus; Chrysococeyx lucidus; Coccyzus melacoryphus; Columba larvata; Columba malherbii; Columba thomensis; 
Coracopsis vasa; Corvus albus; Crithagra burtoni; Crithagra capistrata; Crithagra mozambica; Crithagra rufobrunnea; Crithagra 
sulphurata; Cyanolanius madagascarinus; Cyanomitra olivacea; Dicrurus ludwigli; Dreptes thomensis; Erythrura psittacea; 
Erythrura trichroa; Estrilda astrild; Estrilda melpoda; Estrilda astrild; Euplectes albonotatus; Euplectes aureus; Euplectes 
capensis; Euplectes hordeaceus; Euplectes orix; Euplectes albonotatus; Humblotia flavirostris; Lanius newtoni; Leptosomus 
discolor; Lonchura cucullata; Motacilla bocagil; Myiagra caledonica; Nesoenas picturata; Nigrita bicolor; Nigrita canicapilla; 
Ploceus cucullatus; Ploceus grandis; Ploceus insignis; Placeus melanogaster; Ploceus nigerrimus; Ploceus princeps; Ploceus 
sanctithomae; Ploceus velatus; Ploceus xanthops; Prinia rolleri; Prinia subflava; Progne modesta; Quelea erythrops; Quelea 
quelea; Saxicola torquata; Serinus albogularis; Serinus citrinelloides; Serinus citrinipectus; Serinus flaviventris; Serinus flavivertex, 
Serinus mozambicus; Serinus totta; Streptopelia senegalensis; Streptopelia decaocto; Sylvia atricapilla; Sylvia borin; Sylvia 
dohrni; Terpsiphone atrochalybea; Terpsiphone rufiventer; Terpsiphone rufocinerea; Terpsiphone smithii; Terpsiphone viridis; 
Treron calvus; Treran griveaudi; Treron sanctithomae; Turdus merula; Turdus olivaceofuscus; Turdus xanthorhynchus; Turtur 
afer; Turtur tympanistria; Uraeginthus angolensis; Vidua macroura; Zosterops feae; Zosterops griseovirescens; Zosteraps 
leucophaeus; Zosterops lugubris. 


Sex and age of the individuals is unknown (and not relevant for this study). 


Field-collected samples Blood samples collected in the field were stored in ethanal. 


Ethics oversight ‘No ethical approval was required as no bird was killed, injured or kept captive and we used normal procedures for mist-netting. 


Note that full information on the approval af the study protocol must also be provided in the manuscript. 
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The stiff human foot enables an efficient push-off when walking or running, and was 
critical for the evolution of bipedalism'*. The uniquely arched morphology of the 


human midfootis thought to stiffen it® °, whereas other primates have flat feet that 
bend severely in the midfoot’*“". However, the relationship between midfoot 
geometry and stiffness remains debated in foot biomechanics”, podiatry" and 
palaeontology**. These debates centre on the medial longitudinal arch®* and have 
not considered whether stiffness is affected by the second, transverse tarsal arch of 
the human foot". Here we show that the transverse tarsal arch, acting through the 
inter-metatarsal tissues, is responsible for more than 40% of the longitudinal stiffness 
of the foot. The underlying principle resemblesa floppy currency note that stiffens 
considerably when it curls transversally. We derive a dimensionless curvature 
parameter that governs the stiffness contribution of the transverse tarsal arch, 
demonstrate its predictive power using mechanical models of the foot and find its 
skeletal correlate in hominin feet. In the foot, the material properties of the inter- 
metatarsal tissues and the mobility of the metatarsals may additionally influence the 
longitudinal stiffness of the foot and thus the curvature-stiffness relationship of the 
transverse tarsal arch. By analysing fossils, we track the evolution of the curvature 
parameter among extinct hominins and show that a human-like transverse arch wasa 
key step in the evolution of human bipedalism that predates the genus Homo by at 
least 1.5 million years, This renewed understanding of the foot may improve the 
clinical treatment of flatfoot disorders, the design of robotic feet and the study of foot 
functionin locomotion. 


When walking and running, people use the ball of the foot to apply 
forces that exceed bodyweight”. Because of these forces, the midfoot 
experiences large sagittal-plane torques that bend the foot. A stiff 
midfoot reduces the loss of propulsive work dueto foot deformation 
and helps to efficiently utilize the mechanical power generated by the 
ankle during push-off™. 

The unique arch shape of the human midfootis thought to underlie 
the higher stiffness of human feet compared to other primate feet*° 
(ExtendedData Table). However, stiffnessis notastatic quantity andmus- 
cleactivity can modulate midfootstiffnessinboth humansandapes”””°, 
The static stiffness due to the passive structures of the foot forms the 
baselinearound which muscleswithsimilar mechanicalaction as the pas- 
sive tissuesarelikely to modulatestiffness. Therefore, understanding the 
morphological features underpinning the static stiffnessis crucial for 
bothstaticanddynamicconditions (Supplementary |nformation1.1-1.3). 

The human midfoot has two pronounced arches: the extensively 
studied medial longitudinal arch (MLA)**” and the less-studied 


transverse tarsal arch (TTA) (Fig. 1a). The MLA stiffens the midfoot 
in part through a bow-string arrangement with the stiff longitudinal 
fibres of the plantar fascia’” anda windlass-like mechanism due to toe 
dorsiflexion just before push-off*”. In addition to the plantar fascia, 
thelongitudinally oriented long plantar, short plantar and calcaneona- 
vicular ligamentsare essential for the static midfoot stiffness in humans 
and other primates*, However, in contrast to the plantar fascia, the 
contribution of these ligaments does not depend on the height of the 
MLA, as shown by their nearly equal relative contributions in both 
arched human feet’ and flat monkey feet"* (Extended Data Table Land 
Supplementary Information 1.4). 

The relationship between the height or curvature of the MLA and 
midfoot stiffness remains controversial”, Some people have no dif- 
ficulty walking witha heel-to-toe style despite having littletono MLA”. 
Conflicting evidence also emerges in foot disabilities" and surgical 
reconstruction of the MLA" when correlating MLA height with foot 
flexibility, and casts further doubt on the relationship between the 
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Fig.1|Transverse curvature and stiffness.a, The human foot has two distinct 
archesin the midfoot, the MLAand the TTA. Further anatomical details are 
shownin Extended Data Fig. 1. The typicalloading pattern during push-offin 
walking and runningis shown here. b, A thinand floppy sheet of paper becomes 
considerably stiffer because of transversal curvature. The TTA may havea 
similar role in feet. Scale bars, Sem. 


MLA and midfoot stiffness. Furthermore, there are also debates over 
when a stiff midfoot arose in human evolution*, including what kind 
of foot made the 3.66-million-year-old partly human-like footprints 
atLaetoli?**, 

These debates regarding the arch morphology andstiffness centre 
around the MLA, the plantar fascia and other longitudinally oriented 
ligaments and muscles, and do not consider therole of the TTA (Supple- 
mentary Information 1.4). Even the definition of flatfoot relies mostly 
onthe height of the MLA?”2, However, the TTA may affect midfoot 
stiffness, similar to how even slightly curlinga thin sheet of paper in 
the transverse direction stiffens the paper longitudinally (Fig. 1b). To 
investigate whether the TTA functions in this manner, we performed 
three-point bending tests on arched continuum shells, mechanical 
mimics of the midfootand human cadaveric feet. 


We investigated the relationship between curvature and stiffness 
by modelling the TTAasa curved elastic shell in computer simulations 
and physical experiments (Fig. 2a). We found that shells with greater 
transverse curvature were stiffer in longitudinal bending (Fig. 2b). 
However, the stiffness also depended on thethickness¢, length, width 
w, Young's modulus and Poisson's ratio of the material. To isolate the 
contribution of the transverse arch to midfoot stiffness, we used scal- 
ing analysis to derive dimensionless variables for stiffness and curva- 
ture that are normalized for material property and size differences 
(Supplementary Information 2). Thenormalizedstiffness & isthe ratio 
of the stiffness of the curved shell to that of aflat plate thats identical 
except for the curvature. The normalized curvature ¢ encapsulates the 
mechanical coupling between bending out-of-plane and stretching 
in-plane thatis induced by the transverse curvature c, and is given by 


(a) 


Collapse of the normalized data onto amaster curve shows that Gis the 
chiefexplanatory variable for & (Fig. 2b). Thereisatransitionbetween 
tworegimes around ¢,,=10. Stiffness X increases nonlinearly with cur- 
vature when ¢> ¢, but is mostly insensitive to curvature when ¢<¢,. 
Increasing the longitudinal curvature has no effect onstiffness (Fig. 2b), 
becausetheseshells lack any analogue of the plantar fascia. Transverse 
curvature stiffens the shell because out-of-plane longitudinal bending 
induces in-plane stretching of the material ofthe shell close to the load 
application point (Extended Data Fig. 2and Supplementary Informa- 
tion 2). Therefore, the transverse curvature has the effect of amplifying 
theintrinsicstiffness of a flat plate, whereas the longitudinal curvature 
hasno similar effect. 
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Fig.2|Curvature-induced stiffness in mechanical models ofhominin feet. 
a, Continuum elastic shells with curvature were subjectedto a distributed 
vertical load atone endand clamped atthe other. b, The shell data using 
normalized stiffness (K )and normalized curvature (é). The shells were 
transversally (diamonds) or longitudinally (stars) curved. Inset, stiffness (K) 
versus curvature (c)for continuum shells of various thicknesses (¢) (blue 
shading) inexperiments (diamonds and stars) and simulations (circles).¢, The 
discrete foot mimicsconsisted of three metatarsals arranged inatransverse 
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archand loaded atthe distal end. Longitudinal springsat the hinged base 
mimic the longitudinal ligamentsin feet. Transversal inter-metatarsal springs 
at the distal end mimic transverse elastic tissues. d, The foot mimic datausing, 
normalized stiffness (k )and normalized curvature (é). Inset, stiffness 

(Ky versus transverse curvature (c) for mimics of various lengths (L) and 
thicknesses (¢). Detailed views of the continuum and discrete experiments are 
in ExtendedData Figs.3 and4, respectively. 
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Fig.3| Three-point bending test onacadaverichumanfoot.a, Fresh-frozen 
cadaveric feet (n=2) were thawed and mounted inamaterials-testingmachine 
usingan attachmentat the transected shank. The distal end ofthe heel rested 
onasliding platform with low-friction roller bearings to enable changes in foot 
length. Theball of the foot and the toes rested onalubricated surface. The 
transected shank was displaced downward and the reaction force was 
‘measured. Tests were performed on intact feet and those with transversal cuts. 
b, The transversal cuts between the toesand metatarsals (dashed blue lines) 
were no deeper than the plantar plane of the metatarsal shafts. c, Displacement 
versus force traces or anintact foot (solid black line) and a foot with partially 
separated metatarsals (dashed blue line) foot. Some stress relaxation was 
observed during the initial few cyclesof testing andthe last cycle was used for 
analyses. 


We performed three-point bending tests on discrete mechanical 
mimics of thefootwitha TTA and found similar resultsto the continuum 
shells (Fig. 2c, d). The mimics, which consisted of three metatarsals 


with hinges towards the midfoot, are oflength_, thickness rand trans- 
verse curvature c (Methods and Supplementary Information 4). The 
longitudinal springs at the hinges mimic the longitudinal midfoot 
ligaments that contribute to midfoot stiffness whether arched or not 
(Supplementary Information 1.4). The distally located transverse 
springs mimic inter-metatarsal tissues that influence the predicted 
bending-stretching coupling due to the transverse curvature. Wefind 
that the normalized curvature éaccurately predicts the normalized 
stiffness K for discrete foot-like structures, as for continuum shells 
(Fig. 2d; Methods, equation (2)). The transition instiffness from nearly 
curvature-insensitive to anonlinear increase occursaround ¢, = 3 for 
the mimics. Although this value is different from continuum shells, 
bending-stretching couplingis the common mechanism for curvature- 
induced stiffness and ¢ emerges as the chief explanatory variable. 
Therole of the TTAin human feet could be found by measuring the 
decrease in stiffness upon flattening the TTA; however, altering the 
TTA would also affect other elements, such as the MLA. We therefore 
designed a method that emulates flattening the TTA without altering 
the skeletal structure. The main idea is that the transverse curvature 
induces stiffness by coupling longitudinal bending with stretching of 
theinter-metatarsal tissues, as shown by the analyses of the continuum 
shells and mechanical mimics, and asisalso evident in mathematical 
models of rayed fish fins with transverse curvature. Therefore, cutting 
the inter-metatarsal tissues should disrupt the stiffening mechanism 
and emulate flattening the arch withoutaltering the skeletal structure. 
We tested this idea in the foot mimics by comparing the stiffness of 
transversally curved mimics that lack the inter-metatarsal springs 
with flat mimics that had all springs intact. Both had the samestiffness 
(R=0.98, slope=.05, intercept = 0) (Extended Data Fig. 5), showing 
that cutting the transverse springs disengages the mechanismthrough 
which transverse curvature increases the longitudinal stiffness. 
Todetermine thecontribution of the TTA tostiffness in human feet, 
we performed three-point bending tests on two human cadaveric feet 
(Fig. 3a, Methods and Supplementary Information 5.2) and assessed 
the effect of selectively cutting the transverse tissues between the 
metatarsals (T- condition) (Fig. 3b). To carefully preserve longitudinal 
tissues, we cut only the transverse metatarsal ligaments, the skin 


a Early hominins Fig. 4 | Transverse curvatureof extant and 
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used in ouranalysesand theirrespective 
estimated survival dates: H.naledi*,H.erectus™, 
H.habilis”,A. afarensis" and Burtele®. Pan 
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between thetoes and theinter-metatarsal tissues below the dorsal sur- 
face of the foot. The mechanical work to deform the foot is ameasure 
of stiffness (Supplementary Information 5.3) and cutting these trans- 
verse tissues decreased stiffness by 44% and 54% for thetwo feet (Fig. 3b 
and Extended Data Table 1). Each foot servesas its own control, thereby 
quantifying the contribution of the TTA as the normalized stiffness 
R=Kinuace/Ky-- We found  =1.77 and K =2.18 for the feet for which 
¢=15.4 and¢=16.0, respectively (Fig. 4b; Methods, equation (5)). 

The cadaveric experiments show that the inter-metatarsal tissues 
contribute substantially to footstiffness, and more than the previously 
described contribution of the MLA and plantar fascia of 23% (Extended 
Data Table1and Supplementary Information 1.4). In addition to curva- 
ture of the TTA, the stiffness and slack of the inter-metatarsal tissues 
as well as the mobility of the metatarsals may ultimately combine to 
tune the longitudinal stiffness of the footand thus influencethecurva- 
ture-stiffness relationship of the TTA. Therefore, additional data are 
needed to find the precise curvature-stiffness relationship in human 
feet. Nevertheless, the mechanistic understanding of transversally 
curved structures suggests that the inter-metatarsal tissues affect 
the longitudinal bending stiffness of thefootbecausethe human TTA, 
with é= 15, is sufficiently arched to couple longitudinal bending and 
transverse stretching. 

We use ¢ to compare and track the evolution of the TTA among 
hominins (Fig. 4and Supplementary Information 5). Atone extreme 
are the feet of the vervet monkey, macaque, chimpanzee and gorilla, 
which have ¢<3and are substantially flatter than those of humans, 
which have ¢> 10. Atthe other extremeare species in the genusHomo, 
including Homo naled?*, Homo habilis” and Homo erectus* that pos- 
sessa pronounced TTA with a human-like ¢ = 15. The estimated ¢ of the 
approximately 3.4-million-year-old Burtele foot (from an unidentified 
species) falls within the normal variation ofhumans despite havingan 
abducted hallux”. By contrast, the estimated ¢ of the approximately 
3.2-million-year-old Australopithecus afarensis (AL-333) falls below the 
human range, despite a human-like torsion of the fourth metatarsal”. 

Additional data are needed, especially from earlier hominins such as 
Ardipithecus; however, the available evidence suggests that there were 
severalstagesin the evolution of thearch ofthehuman foot. First, apes 
suchas chimpanzeesand presumably thelast common ancestor ofapes 
and homininslack both aMLAandaTTA, and thus areableto stiffen the 
midfoot only partially using muscles*. By 3.4 million yearsago, and pos- 
sibly earlier, ahuman-likeTTAhad evolved that may haveincreased mid- 
foot stiffness during propulsion in theBurtele hominin (Supplementary 
Information 5.4). Compared with humans, the TTA was apparently less 
developed in. afarensis, which also lacked a fully developed MLA””— 
consistentwith analyses of the 3.66-million-year-oldLaetoliG footprints 
thatarethought tohave been made by A. afarensis”. Finally, inthe genus 
Homo wesee a full MLAand TTA, enabling both effective walking and 
running, Theseinferences need tobe tested with additional fossilsincor- 
poratingnot only analyses ofthe MLA butalso the TTA. 

Our findings show a previously undescribed and substantial role 
for the TTA inmidfootstiffness. Traditional thinking inbiomechanics, 
human evolution and clinical practice, with an emphasison the sagittal 
plane and the MLA, should thus be expanded to incorporate the TTA 
and the transverse axis that is orthogonal to the sagittal plane. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Ethical compliance 
The authors have complied with all relevantethical regulationsincon- 
ducting the research for this paper. 


Numerical simulations 

We simulated the elastic response of arched shells using the Shellinter- 
face inthe 3D Structural Mechanics module of COMSOL Multiphysics 
v.5.1(COMSOL AB). The TTA isrepresented by the map for the central 
plane of theshell given by S:(x, y)= (x, Rysin8,, R,cos6,) in which 8,=y/Rr, 
xe [-L/2,L/2]andy € [-w/2, w/2] (Extended Data Fig. 2). For all the 
simulations, weset = 0.1m andw=0.05m. The material was modelled 
as linearly elastic with Young’s modulus £=3.5 MPa, Poisson’s ratio 
v= 0.49 and mass density p= 965 kgm’. 

The boundary atx=-L/2isclamped-thatis, zero displacements and 
rotations. The conditions at the other boundary x=L/2 area uniform 
shear load T, zero bending moment alongzand zero in-plane traction 
so that the displacements are free (see Extended Data Fig. 2for orien- 
tations of the axes). 

Wesolvethis model fora range of thicknesses ¢, from3mmto9 mm 
in steps of 1 mm, and transverse curvature radii R, = 0.03 m, 
0.05 m, 0.07 m, 0.1m, 0.3m, 0.5m, 0.7m, Imand3m. For each com- 
bination of cand f,, shear T ranging from 0Nm'to1Nm‘isapplied 
in increments of 0.1N m*. The resulting out-of-plane displacement 
6z is measured (Extended Data Fig. 2b) and plotted against 7. The 
slope of these curves extrapolated to T= Oyield the stiffness defined 
as k=wT/6z. 


Continuum shell experiments 

We fabricated and measured the stiffness of shells with an arch in the 
transverse or longitudinal directions, and compared them against aflat 
plate. These wereall fabricated using polymer moulding techniques 
with polydimethylsiloxane (PDMS). The mould was fabricated using 
additive manufacturing (3D printed using Projet 460Plus, 3D Systems). 
The printed mould was a few millimetres in thickness, with one side 
left open. APDMS silicone elastomer (Sylgard184, Dow Corning) was, 
used to cast thearch inthe mould. Because the volume ratio of the base 
polymer to the curing agent controls the material bulk modulus for 
PDMS, the same ratio of five parts base polymer to one part of curing 
agent by weight was consistently maintained across all fabricated 
arches (Supplementary Information 3). Duringan experiment, the fab- 
ricated arch was mounted on the experimental rig with help of clamps 
thatwere custom-fabricated to exactly match the arch curvature. The 
clamps were additively manufactured (Stratasys Dimension 1200es) 
with acrylonitrile butadiene styrene (ABSPlus) thermoplastic material 
(glass transition temperature, 108 °C). One end of the clamped arch 
was fixed to a rigid frame and the other end of the clamped arch was 
pushed upon by athin edge (knife edge) that was mounted ona force 
sensor attached toa vertical translation stage (Extended Data Fig. 3a). 
The forces were measured using a data-acquisition system (LabView, 
National Instruments) at 2 KHz for a duration of 1s. The load test was, 
performed under quasi-static loading of the arch sample by provid- 
ing small displacements (quasi-static steps) of 5x 10m (50 um) per 
step fora total of 10 quasi-static steps (5 x 10m or $00 um). Forces 
were measured after each quasi-static displacement. The slope of 
the force-displacement curve is the stiffness K for the arch sample. 
Three experimental runs were conducted for each arch and their 
force-displacement curves were reproducible to within measurement 
error. 


Foot mimics 
We designed, fabricated and performed load-displacement tests on 
mechanical mimics of the foot that were transversally curved (Fig. 2, 
Extended Data Fig. 4 and Supplementary Information 4). The mimic 
consisted of three rigid metatarsals hinged at their bases. Instead of 
every bone in the foot, the mimics were simplifications that captured 
the longitudinal bending of the metatarsals and lumped all midfoot 
mobility into hinges at the proximal base of the metatarsals. 

The metatarsals were of length and the hinges were arranged ina 
transverse arch of curvature cso that the axis of each hinge was at an 
angle with its neighbour (Fig. 2c and Extended DataFig. 4a). Each hinge 
had an extension spring held at a fixed moment arm equal to half the 
thickness cand provided torsional stiffness (Extended Data Fig. 4b). 
Aninter-metatarsal transversally oriented spring connected adjacent 
metatarsalsat the distal end and would resistany splayinginduced by 
the transversearch. 

Inhominin feet, the distal end of the metatarsals are level on the 
ground when loaded. Therefore, the presence ofa TTAsuggestsincreas- 
ing torsion for the lateral metatarsals (Extended Data Fig. 6b, c). The 
distal end of the metatarsals in the mimics were made to rest on hori- 
zontal, low-friction metallic platforms (Extended Data Fig. 4a). The 
vertically staggered arrangement of the platforms mimics the effect 
of the distal end of the metatarsals being on the same horizontal level. 
The platforms were attached toa micrometre-precision translation 
stage for applying vertical displacements. The base of the hinges were 
rigidly clamped to asix-axis force sensor (JR3) to measure the reaction 
forces dueto the displacement. Stiffness was estimated as theslope of 
the force-displacement curve in each trial. 

Multiple geometries were tested and the dimensions chosen to 
approximate the metatarsal lengths and midfoot widths of hominin 
feet, including chimpanzees and humans. The length was varied from 
75 to 125 mm (3 values), thickness ¢ from 18.5 to 26.8 mm (3 values) 
and curvature from 0 to 0.025 mm-* (6 values). The spring constants, 
‘measured inan Instron materials testing machine, were 1.76N mm and 
0.70 Nmm* for the longitudinal and transverse springs, respectively. 
Three trials were performed for each foot and the force-displacement 
data were reproducible to within measurementerror. 

The normalized stiffness is K =K /Kna.. For a flat mimic with three 
‘metatarsals, each of length L, thickness ¢and having a longitudinally 
oriented spring at its base of stiffness k,,, the longitudinal stiffness is 
given by Kax=3kn(¢/2)'/L2 (Supplementary Information 4.3). Inageneral 
setting, the longitudinal springsstiffness would be proportional to the 
width w of the midfoot by virtue of accommodating a greater amount 
of parallel elastic tissues. Therefore, the longitudinalstiffnessis equiv- 
alently parameterized by the stiffness per unit width k, = 3k,,/w. 

Supplementary equation (4.4) for the stiffness ofa flat mimic was 
independently verified using load- displacement tests of eight differ- 
ent flat mimics (Extended Data Fig. 4cand Supplementary Information 
4.4). Weuse this relationship to normalize the measured stiffness of 
all of the mimics by a single chimpanzee-like flat mimic of length 
Ly=75mm, thickness ¢,=18.5 mm and widthw,=60mm, and for which 
the measured stiffness is Ko. By definition, thenormalized stiffness of 
the chimpanzee like flat mimicis Ky=1. Therefore, the measuredstiff- 
ness K of amimic with length L, thickness cand width wis normalized 
according to 


(2) 


Cadavericfeet 

We conducted three-point bending tests using a materials testing sys- 
tem (Instron model 8874) on two fresh-frozen cadaveric feet obtained 
from posthumous female donors (age, 55 and 64 years, body weight, 
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1,023 Nand 596 N). The loading protocol and boundary conditions 
under the foot were as previously described’. The tibia and fibula 
were transected midshaft and implanted in Bondo Fibreglass Resin 
(3M) and secured to the displacement-controlled force sensor on 
the Instron actuator. The ankle was at a neutral angle of 90°. The heel 
rested ona rigid platform that was mounted onlow-friction sliders to 
permit foot-length changes. The forefoot rested ona highly lubricated 
surface to permit the foot to naturally deform in all directions when 
loaded. The contact point onthe heel was maintained at the posterior 
end by placing the heel at the anterior edge of the sliding heel plate 
so that the heel force mimics the action of the Achilles tendon. The 
tests were quasi-static with a displacement rate of 0.5 mm sto 
0.6mms*. 

The displacement z,.., required to achieve a load of 3x the body 
weight was measured and then cyclically applied 10-15 times. Thelast 
cycle was used for analyses because there was some stress relaxation 
during the first 6-7 cycles. The area under the curve of the displace- 
mentz versus the force Fis the work Wneeded to deform the foot. 
Following Supplementary equation (5.4), Wyieldsan effective stiffness 
of the foot Ky; given by 


a 
2h rdz 6) 


Ken= 
Zpeak 0 


The same measurements were repeated after bisecting the distal 
transverse metatarsal ligaments, the skin between the toes, and the 
muscles and fascia connecting the metatarsals. The inter-metatarsal 
tissues were transected from the dorsal surface of the foot and the 
the cuts extended no deeper than the plantar plane of the metatarsal 
shafts. Therefore, none of the branches of the plantar fascia or other 
midfoot ligaments was affected. 

Because the applied displacement wasthesamefor theintact feetand 
those with bisected inter-metatarsal tissues, the ratio of workis equal 
tothe ratio of the effective stiffness (Supplementary equation (5.5). 


Monte Carlo simulations 

Anatomical variability in the size of feet (Extended Data Table 2) is 
incorporated using Monte Carlo simulations to generate statistics 
for normalized curvature (Fig. 4). The histograms generated from 
the Monte Carlo simulations are mostly non-Gaussian. Therefore, the 
median and quartiles are reported in addition to themean ands.d.We 
used I million random combinations of the anatomical dimensions, in 
which each dimension was drawn froman independent Gaussian distri- 
bution with mean and s.d. values according to Extended Data Table 2, 
3. Increasing the size of the Monte Carlo beyond amillion sampleshad 
no effect on the statistics of the estimated quantities for the number 
of significant digits reported. The Monte Carlo simulations probably 
overestimated the variance of relevant ratios such as w/L and ¢/L in 
comparison to hominin feet, because we use independent variation 
ofall dimensions and do not incorporate covariation that may exist. 
Suchinflation of variance because of an assumption of independenceof 
variablesis evident when comparing primary measurements to Monte 
Carlo estimation of ¢ for humans (Extended Data Table 2). 


Morphometrics of feet of extant species 
Humans, Human morphometrics were obtained from 12 individuals 
(6 cadaveric, 6 human volunteers) using radiographic computed to- 
mography (CT X-ray imaging) and software-based segmentation and 
three-dimensional model reconstruction. These feet were all evaluated 
byaclinical radiologist and identified as non- pathological. Thecollec- 
tion, analyses and reporting of data from live human subjects were ap- 
proved by the Yale IRB. Details on the subjectsand CT data-processing 
methodsare provided in Supplementary Information 5.1. 

Wemeasured thelever length / following the standard definition as 
the distance from the posterior end of the calcaneus to the anterior 


end of the distal head of the third metatarsal. The width w is meas- 
ured at the tarsometatarsal joint, as the mediolateral separation of 
the most medial aspect of the distal articular surface of the medial 
cuneiform to the most lateral aspect of the distal articular surface of 
the cuboid. The thickness ris defined as the dorso-plantar thickness 
of the proximal head of the third metatarsal, or the average of the 
second and fourth, when the third metatarsal data are unavailable. 
The curvature cis based on the torsion Oy, of the fourth metatarsal, 
which was measured using the shape of the articular surface using 
established protocols". 


Non-human primates 

Published data were used for morphometrics analysis of non-human 
primates: P. troglodytes (n=106)*°?™, G. gorilla (n=59)"*°, 
C.aethiops (n= 56)*** and M. nemestrina (n=44)"**”, 

Published data are sparse and not all required measurements were 
available fora single sample in the published literature for C. aethiops 
and M. nemestrina. Therefore, we added data from specimens that 
were most similar in their lever length L to the mean value reported 
in the literature. We carried out these measurements using software- 
based photogrammetry™ of high-resolution images and cross-verified 
with measurements using a digital caliper (0.01 mm resolution). The 
C.aethiops foot is from the Yale Biological Anthropology Laboratory 
(YBL.3032a) and the M. nemestrina specimen from the Yale Peabody 
Museum (YPM MAM 9621). 

The mean and s.d. of the lever length £ were estimated from pub- 
lished data for chimpanzee” ™, gorilla”, C.aethiops**andM. nemes- 
trina”, Mean wis estimated from reported w/L or dorsal skeletal 
views for chimpanzees and gorillas”, and primary measurements 
for C. aethiops and M. nemestrina. To estimate the s.d. of w, we used 
reported variability in the medio-lateral width of the proximal metatar- 
sal heads for ll species” to estimate the coefficient of variation (s.d./ 
mean), and applied thattow. The mean and s.d. of ewere all obtained 
from published values” and confirmed with primary measurements 
for available specimens. Torsion of the fourth metatarsal Bj, is used 
to estimate the transverse curvature and published values were used 
forall non-human speciesincluded in this study****. For species for 
which the feetare regarded as flat, weusedthesamemetatarsal torsion 
values as P. troglodytes. 


Fossil feet 

Weused photogrammetry” on published images of fossil feet (Fig.4d), 
as well as data tables thataccompanied the publication of these fossil 
data to estimate the necessary dimensions and ratios". 

Among the fossil feet, all except the foot of H. naledi** were incom- 
pletein some regard. For those incomplete feet, an extant species was 
selected asa template by taking into consideration published analyses 
of other posteranial and cranial elements. On the basis of thisinforma- 
tion, H. sapiens was chosen as the template for H. erectus (Dmanisi)”* 
and H. habilis (Olduvai hominin)” and G. gorilla was chosenas the tem- 
plate for A. afarensis (AL 333)” andthe unknown hominin foot foundin 
Burtele”. For example, the sole fourth metatarsal of A. afarensis does 
not permit the direct estimation of w. However, only the ratio w/L is 
necessary for the analyses, and the ratio of gorillais used for the Monte 
Carlo analysis of the fossil. The metatarsal, however, providesa direct 
measurement of, butnot of L. Therefore, to estimate the ratiot/L, we 
incorporate the measured thickness ¢and the gorilla’s ratio ¢,/L, by 
using the formula 


ro (4) 


in which (¢,) is the mean ¢ of gorilla. This template-based estimation 
therefore incorporates direct measurements where available, without 
assuming that the fossil exactly resembles the extant template. 


Curvature of hominin feet from metatarsal torsion 
Followingstandard practicein the literature®™”, we use the torsion of 
the fourth metatarsal (@rr4) to estimate TTA curvature. This measure 
also facilitates the estimation of TTA curvature using partial or disar- 
ticulated fossils. When the proximal metatarsal heads forma transverse 
archand the distal metatarsal heads rest on the ground, thelateral meta- 
tarsals increasingly acquire torsion about their long axis (Fig. 4b and 
Extended Data Fig. 6b,c). We compared the torsion-based estimate of 
curvature versususing the external geometry of the dorsalsurface of the 
skeleton and found good correspondence (Extended Data Fig. 6dand 
Supplementary Information 5.1). The torsion Oy, arises from the curva- 
ture cover the width w of the tarso-metatarsal articulation and there- 
fore the curvature is approximated by c= 8,,,,/tv. Using equation (1), 
the torsion-based estimate of the normalized curvature parameter 
for the TTAis 
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Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The data supporting the findings of this study are available within the 
paper and its Supplementary Information. 
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Extended Data Fig. Illustrated anatomy of the foot.a, Identification of the 
bonesof the foot that are referred toin the main text. The cuneiforms, cuboid 

and the navicular are collectively referred tos the tarsal bones. b, The plantar 
fascia, atough elastic band, extends from thecalcaneus to the distalend of the 
phalanges. The fascia splitand rejoin at multiple locations. ¢, The long plantar, 
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shortplantar and calcaneonavicular ligamentsare locatedin the midfootand 
are primarily longitudinally oriented. The deepand superficial transverse 
‘metatarsal ligaments are examples of stiff, transversally oriented elastic 
tissues betweenthe metatarsals. Anatomical imagesare from Primal Pictures. 


knife-edge load 


vertical displacement 8 (mm) 


Extended DataFig.2| Mathematical and computational analysis of 
continuum elastic shells. a, The shellis clamped atone endand loaded witha 
knife edge atthe other. Itis oflength/, widthw, thickness ¢and has radius of 
curvature (curvature c=1/R).b, The free end displaces bya height 6z0n 
Joadingand reaction forces atthe clamped end resist deformation. c, Across: 
sectional view of the shell shows the location of theneutral plane, ifthe shell 
were toact asanelasticbeam. d, Out-of-plane (z-axis) displacement profile for 
‘onenumerical simulation of a shell (L =0.1m,w=0.0Sm,t=0.003m, 


eutral plane 


secon Plola-Kirchoff stress orc (KP3) 


R=0.03 m). Most ofthe displacement happens close to the loaded edge, unlike 
anelastic beam. e, The stress componentg,,isshownasacolour map of the 
undeformedshell. Inan elasticbeam, the intersection of the neutral plane with 
theshell (c) would exactly match the locations of zero stress, Because of 
curvature-induced in-plane stretching, the zero-stress curve differs fromthe 
neutral plane predictionsin the vicinity of the loaded edge and-toalesser 
extent-near the clamped boundary. 
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Extended DataFig.3| Experimental characterization ofarchedshells.a,The __ thatshowtthe linearity of the force-displacement data. The best-fit quadraticis, 
experimental set-up used instiffiness measurements. b, Amagnification ofthe _indistinguishable from the linear fitto within sensor resolution.d,e, The 

shell from underneath shows how a curvature-matched edge-loading Young’s modulus (d) and Poisson’s ratio (e) of the PDMS material used to 
attachment was used to mimica theoretical knife edge. Acurvature-matched _ fabricate the shells were estimated from simultaneousstress and strain 

clamp was fixed and glued tothe other end oftheshell.c, Representativedata __ measurements during an extension test ofa rectangular PDMS block. 
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Extended Data Fig. 4 | Designand characterization of discrete mechanical 
foot mimics.a, Experimental arrangement forload-displacement 
‘measurements. The distal loading platforms for the three metatarsals are 
staggered in heightso that all three metatarsals are loaded vertically despite 
the transverse curvature. In hominin feet, thisisaccomplishedby the 
‘metatarsal torsion. b, Side view ofa single metatarsal showing length/ and 
thickness cof the foot mimics. The effect of thicknessisto provideamoment 
arm for the longitudinal spring and thus affect the rotational stiffness of the 


hinge.c, Mimics with three different thicknesses were fabricated andthe 
thickness was estimated using load- displacement measurements on 
curvature-free flat mimics, The accuracy of the estimated thickness values are 
evaluated by plotting the predicted stiffness based onthe thickness estimates 
against the measured stiffness, Details of the thickness estimation technique 
and statistics of thestiffness-stiffness correlation are provided 

in Supplementary Information4.4. 
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flat mimic stiffness (N/mm) 


Extended Data Fig. 5| Effect of cutting the transverse springs in mechanical foot mimics. Stiffness of transversally curved foot mimics lacking the transverse 
inter-metatarsal springs (T)isstrongly correlated with the stiffness of flat mimics with intacttransverse inter-metatarsal springs. 
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Onerra/w, curvature from MTA torsion (1/em) 


Extended Data Fig. 6 | Transverse curvature ofhominin feet.a, Definitions of 
length and width w.b, Definition ofthe thickness¢. The fourth metatarsal is 
highlightedingreen. The distal heads of the metatarsals rest lat onthe ground 
and the proximal heads are raised away fromthegroundtodifferent degrees 
because of the TTA. ¢, Schematicshowingthe accrual of torsiononthelateral 
‘metatarsalsabouttheirlongaxis. Thecurvature of the TTA was estimated using 


the torsion of the fourth metatarsal 8x. Inaddition, the average curvature was 
also estimated using the angle of the normal othe dorsal surface of the fourth 
metatarsal QS measured inthe midfoot (Supplementary equation (5.3)). 
d, Linear regression of the twomethodsto estimate TTA curvature. Details of 
the curvature estimation procedure and statistical results of the regression are 
provided in Supplementary Information 5.1. 
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Extended Data Table 1| The estimated work during foot deformation 


Species Foot condition Variable Value 
Cadaveric data collected for this study 
Homo sapiens, sample 1 intact Wh 5.5 J 
—transverse tissue Wr- 31d 
(Wa — Wo-)/Wr 44% 
Homo sapiens, sample 2 intact Wr 745 
—transverse tissue Wy 3.45 
5A% 
Homo sapiens intact Wr 10.15 
—plantar fascia Wapr- 785 
(Wr — Wrpt-)/Wr 23% 
—long plantar ligament Wrap 4.6 J 
(Wipt-—Watp-)/Wh 32% 
~short plantar ligament Wasp- 215 
(Wrap-—Whsp-)/Wn 25% 
~calcaneonavicular ligament = Wh,cn— DIT 
(Wasp-—When-)/Wn 9% 
Macaca nemestrina intact Wn 5.1 J 
plantar fascia removed Writ 5.0 J 
(Win — Wanyot-)/Win 2% 
—long plantar ligament Wrn,ip— 31d 
(Wrnpt-—Wintp—)/Wmn 37% 
—calcanconavicular ligament Win,en— 245 
(Wmn,p- — Winen—)/Wm 14% 
Chlorocebus aethiops _intact We 3.95 


Data are obtained from cadaveric tests and from published load versus displacement data for humans’, C. aethiops and M. nemestrina®. In addition to the foot deformation work of the intact 
human foot(W), the cadaveric experiments performed inthis study included the transection of the transverse inter metatarsal elasitssues, shown as Wi. The peak displacements inthe ests 
‘were 12 mmand 18mm for samples 1and2, respectively The published datafor the thee species include intact fet, Man Wand fee with transection ofthe plantar fascia (W, .) the 
tong plantar ligament W,-} the shor plantar igament(W, and the calcaneonavicular ligament (W,,9-} These estimates were obtainedby digitizing the published plot" of load 
versus displacement and measuring te area under the curve asthe fot was loaded“. The contribution ofeach ofthe transected tisues are representedas the ratio the decrease in work 
aftertransection tothe ntact stifness of the same foot. The previously published tansections"* were performedin the same sequence as lstedin his table. Raw data ar available forthe two 
cadaveric spacimens.a Supplementary formation. 


Extended Data Table 2 | Foot morphometrics for extant species 


Sp L(mm) ww (mm) t(mm) Osa (deg) é 
mean SD mean SD mean SD mean SD mean SD 

Homo sapiens* 177 169 507 40 16.1 16 250 46 16.9 27 

Homo sapiens” 200 140 60.0 54 18.0 16 23.6 7.1 156 56 


Chlorocebus aethiops 85.0 43 24.0 12 9.0 0.45 0 2.5 0.0 15 
Macaca nemestrina 100 60 35.0 21 100 06 0 2.5 0.0 13 
Pan troglodytes 130 (13.0 520 52 130 13 0 25 0.0 12 
Gorilla gorilla 176 «176 «672.5 7.3 160 16 2.2 15 dal 0.8 
“Primary data collected by us from 12 samples. 

*Bootstrapped Monte Carlo analysis using published data. 


Metrics were obtained from primary data for humans and Monte Carlo estimates fora species. For the Monte Carlo estimation, the dimensions are modelled as Gaussian random variables. 
Mean +. values were obtained from values reported inthe literature (see ‘Morphometrics of feet of extant species’ in the Methods for deals). Although the primary data were smaller feet 
than the published data, the ratios w/L and t/L were almost equal. The morphometric variables are the lever length of the foot L, width ofthe tarso-metatarsal articular region w, dorso-plantar 
thickness ofthe third metatarsal t and torsion of the fourth metatarsal 8, From these, the normalized curvature parameter é was estimated. 
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Extended Data Table 3 | Fossil morphometric data 


Spi Specimen Ours (deg) L(mm) w(mm) t(mm) w/L t/L 
H. naledi — UW 101-1456 38.0 137.0 38.0 16.0 0.277 0.117 
H. erectus D2669, D4165 28.0, 29.0 - - 17.0 ge Hy E 
H. habilis OH 8 25.0 112 44.0 “J 0.393 r 
Burtele BRT VP2/73 26.5 - - 30 ge 
A. afarensis AL 333-160 17.0 - - VW a one 


Values of, w, and 6,,,usedin estimating the normalized curvature éof fossil samples. Variable names with a subscript h refer to human values (for example, t) subscript pto chimpanzee (for 
‘example, w,) and subscript gto gorilla for example, L,). These values are represented by normal distributions as shown in Extended Data Table 2, Variablesin angled brackets, such as (4) refer 
to the mean value shown in Extended Data Table 2. See Methods for details of source materials 
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Neural circuitry linking mating and egg 
laying in Drosophila females 


Fei Wang", Kaiyu Wang“, Nora Forknall', Christopher Patrick’, Tansy Yang’, Ruchi Parekh’, 
Davi Bock“ & Barry J. Dickson"? 


13 January 2020 


Mating and egg layingare tightly coordinated events in the reproductive life of all 
oviparous females. Ovipositionis typically rare in virgin females butisinitiated after 
copulation. Here we identify the neural circuitry that links egg laying to mating status 
in Drosophila melanogaster. Activation of female-specific oviposition descending 
neurons (oviDNs) is necessary and sufficient for egg laying, ands equally potent in 
virgin and mated females. After mating, sex peptide—a protein from the male seminal 
fluid—triggers many behavioural and physiological changesin the female, including 
the onset of egg laying’. Sex peptide is detected by sensory neuronsin the uterus *, 
and silences these neurons and their postsynaptic ascending neuronsinthe 
abdominal ganglion’. We show that these abdominal ganglion neurons directly 
activate the female-specific pCl neurons. GABAergic (y-aminobutyric-acid-releasing) 
oviposition inhibitory neurons (ovilNs) mediate feed-forward inhibition from pC1 
neurons to both oviDNsand their major excitatory input, the oviposition excitatory 
neurons (oviENs). By attenuating the abdominal ganglion inputs to pCl neurons and 
ovilNs, sex peptide disinhibits oviDNs to enable egg laying after mating. This circuitry 
thus coordinates the two key events in female reproduction: mating and egg laying. 


Published online: 26 February 2020 
\® Check for updates 


We reasoned that egg laying is likely to depend on cell types that are 
female-specificand hence expressone or both ofthesex-determination 
genes*fruitless (fru) and doublesex (dsx). In particular, egg laying is 
blocked by either silencing’ or masculinizing® all fru’ neurons. Some 
of theseru’ neuronsare descendinginterneurons, which project from 
the brain to the ventral nerve cord and are thought to convey high- 
level motor commands’. We therefore focused on female-specific fru” 
descending neurons and used the split-GAL4 technique" to obtain 
two driver lines that label two female-specific fru’dsx- cholinergic 
descending neurons per brain hemisphere (Fig. 1a, b, Extended Data 
Figs. 1-3). In optogenetic activation experiments using Chrimson”, 
both split-GAL4 driver lines reliably induced oviposition behaviourin 
mated females, with most but not ll females also depositing an egg 
(Fig. 1c, d, Supplementary Video 1; we presume that notall femaleshad 
an eggin the uterus at the time of neuronal activation). Accordingly, 
werefer tothese neuronsas oviposition descending neurons (oviDNs), 
and to the two split-GAL4 driverlines thatlabel them as oviDN-SS1 and 
oviDN-SS2 (in which SS denotes stable split-GAL4). Stochastic label- 
ling of single neurons" resolved two morphologically distinct types 
of oviDN, which we refer toas oviDNa and oviDNb cells (Fig. 1b). In 
an electron microscopy volume of a full adult female brain (FAFB"), 
we identified two oviDNavlike cells and one oviDNb-like cell in each 
hemisphere (Fig. 1b, Supplementary Video 2). 

Egg laying by mated females was completely blocked by genetic 
ablation of oviDNs, and markedly reduced by their chronic silencing 
(Fig. le, Extended Data Fig. 4a,b). Virgin females in which oviDNs were 
ablated wereas receptive to matingas control females (Extended Data 


Fig. 4c). Several days after mating, the ovaries of oviDN-ablated females 
contained many mature eggs, and most carried either a fertilized egg 
ora first-instar larva in the uterus (Fig. If). We conclude that oviDNs 
are essential for oviposition, but dispensable for mating, ovulation 
and fertilization, 

We were unable to generate driver lines that specifically target 
oviDNa or oviDNb cells. To determine which oviDNsubtype is involved 
inoviposition, wetherefore performedastochastic‘unsilencing’ experi- 
ment, in which a tdTomato-tagged silencing transgene was targeted 
toall oviDNs, but stochastically replaced in some of these cells with 
GFP. Individual females were assayed for egg laying over five daysafter 
mating, then dissected and stainedto determine their complement of 
red (tdTomato; silenced) and green (GFP; unsilenced) oviDNs. Females 
with no unsilenced cells laid no or very few eggs, whereas those with 
justa single functional oviDN cell generally laid large numbers of eggs 
(Fig. 1g, Extended Data Fig. 5). The number of eggs laid per female was 
variablein these cases, but there was no appreciable difference between 
females in which an oviDNa cell was unsilenced and those in which an 
oviDNb cell was unsilenced, nor between females in which either one 
or two cells of either type were functional. Although the oviDNa and 
oviDNb subtypes differ in their morphology—and probably their con- 
nectivity and physiology-these data suggest that they nonetheless 
have similar functions in oviposition. 

Oviposition involves a coordinated and highly stereotyped sequence 
of motor actions” that progresses from abdomen bending to ovi- 
positor extrusion and egg deposition (Fig. 2a). Abdomen bending, 
ovipositor extrusion and egg deposition wereall eliminated in females 


“anelia Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA. “Queensland Brain Institute, University of Queensland, St Lucia, Queensland, Australia, ‘Present address: 
Department of Neurological Sciences, University of Vermont, Burlington, VT, USA. “These authors contributed equally: Fel Wang, Kalyu Wang. “e-mail: dicksonb@janelia.hhmi.org 
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Fig.1|oviDNs control oviposition. a, Confocal images showing brain (top), 
ventral nerve cord (middle) and abdominal ganglion (bottom) ofan oviDN-SSI 
female, stained to reveal oviDN membranes (UAS-myrFLAG; green), presynaptic 
sites (UAS-syeHalo; red) and all synapses (nc82;,blue).Scale bars, 50 um. 

, Electronmicroscopy (EM) reconstructions (top)and confocal light microscopy 
(LM) images (bottom) of single oviDNa and oviDNb neurons. ¢, Snapshots of 

the oviposition sequence that was induced after photoactivation of oviDNs 

(5s, 6351m, 261 hWmm;Supplementary Video 1). d, Percentage of mated 


in which oviDNs were ablated (Fig. 2b). Conversely, abdomen bend- 
ing and ovipositor extrusion were reliably triggered by strong pho- 
toactivation of oviDNs in either virgin or mated females (Fig. 2c). Egg 
deposition was also induced, but only in mated females (presumably 
because matingis required to stimulate ovulation). Inall of these oviDN 
activation experiments, the sequence of motor actions was the same 
as that in natural egg laying (Fig. 2a, c). By varying the stimulus inten- 
sity, we found that egg deposition has a higher activation threshold 
than abdomen bending and ovipositor extrusion (Fig. 2d), and that 
action latencies were shorter at higher stimulus intensities (Fig. 2e). 
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Fig.2| oviDNsinducethe oviposition motor sequence with equal potency in 
virginand mated females.a, Ethograms of oviposition actions by mated wild- 
type females, aligned to the onset of egg deposition. b, Frequency of 
oviposition motor actions by mated females. ***P<0.001by Wilcoxontest. 
Scatter plots show mean +s.e.m,¢, Ethograms of oviposition actions that were 
induced by photoactivating oviDNs in mated and virgin females, aligned tothe 
onset of the light stimulus, Pink bars indicate 2s of 635-nm illumination. 
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females that exhibited ovipositionand egg deposition after illumination, 

e, Number of eggs laid per femalein the 48 hours after mating. f, Percentage 

of females with an arrested egg (arrowhead inimage) or embryoin the uterus 
10 days after mating. g, Number of eggs laidin the five daysafter matingby 
females withall but one or two oviDNssilenced.***P<0.001, "P< 0.01,*P<0.05 
by Fisher's exact test (d,) or Wilcoxon test (e, g). Scatter plots show 
means.e.m.(€,g). 


Moreover, at low stimulus intensities, the oviposition sequence was 
often truncated, but an action was never skipped, and only once did 
we observe asingle action occurring out of order (ina total of 38 flies 
at each of 3 intensities; Extended Data Fig. 6). These data suggest that 
oviDNs may use a ramp-to-threshold mechanism to elicit the succes- 
sive motor actions of oviposition"®. Notably, theactivation thresholds 
and action latencies were indistinguishable between virgins and mated 
females (Fig. 2d, e),indicating that mating status regulates egg laying 
through the brain circuits upstream of oviDNs rather than through 
downstream motor circuits. 
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d,e, Percentage of flies that exhibited the indicated actions (4) and latencies to 
action onsets, shown as mean +s.e.m. e) at varying light intensities. Egg 
deposition wasassessed in different fliesat each light intensity (n= 28-46). 
Other actions were examined on the same set of lies in order of increasing 
light intensity (virgin, n=14; mated, n=24). There wasno significant difference 
between virgin and mated females (by Fisher's exact test 4) or Wilcoxon 
test(e)). 


The onset of egg laying after mating is induced by sex peptide, a 
protein of the male seminal fluid’ that is detected by sex-peptide sen- 
sory neurons (SPSNs) ofthe uterus™, Sex peptidesilences both SPSNs 
and their postsynaptic targets in the abdominal ganglion, the SAG neu- 
rons*. Artificially activating either SPSNs or SAG neurons suppressed 
egglayinginmated females” *(Fig. 3a, Extended Data Fig. ).Conversely, 
ablating (Fig. 3b) or silencing” these cells increased the number of 
eggs laid by virgin females. Virgin egg layingasaresultof SPSNor SAG 
ablation depended on oviDNs, as egg laying was prevented if these 
cells were co-ablated (Fig. 3b). SPSNand SAG activity is thus critical in 
keeping oviDNs inactive until after mating. Thisinhibitionis mostlikely 
tobe indirect, because the SAGs are cholinergic and hence probably 
excitatory (Extended Data Fig, 3). We identified and extensively traced 
theascending projections of the two SAG neurons in the FAFB volume 
(Fig. 3c) and found justa single synapse from SAG neurons to oviDNs 
(Extended Data Table1). 

The targets of SAG neurons in the brain have not been identified. 
Because SAG neurons regulate female receptivity as well as egg lay- 
ing’, we speculated that their targets could include the female-specific 


‘frurdsx" pC1 neurons in the protocerebrum, which are known to regu- 
late receptivity”. Within the FAFB volume we identified five morpho- 
logically distinct pC1 cells in each hemisphere, which we refer to as 
pCla-pCle (Fig. 3d, Extended Data Fig. 7a, Supplementary Video3). Our 
extensive tracing of single pCla, pClc and pCle cells, as well as more 
limited tracing of pClb and pCid cells, suggests that the SAG neurons 
providenumeroussynapticinputsto the pCla, pClband pCiccells, with 
fewerif any direct inputs to pCld and pClecells (Fig. 3d, Extended Data 
Table, Supplementary Video 4). We performed whole-cell recordings 
from individual pCl neurons while photoactivating the SAGs (Fig. 3d-f, 
Extended Data Fig. 7b), and found that pClacells were strongly depolar- 
ized, pClb cells were weakly depolarized and pClc, pCldand pClecells 
showed little or no response upon SAG activation (Fig. 3d-f). There 
were numerous synaptic connections amongst all five pClsubtypes, 
however (Extended Data Table 1), suggesting that any information on 
‘matingstatus thatis obtained from SAG neurons by pClaand pC bells, 
is potentially shared across the entire set of pCl cells. 

We obtained two split-GAL4 driver lines for pCl neurons: pCI-SS1, 
which labels pCla, pClcand pCle, and pCISS2, which labels all five 
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Fig.3|pClneuronssuppress egg laying and oviDNactivity andare regulated 
by thesex-peptide pathway. a,b, Number of eggslaid by mated femalesinthe 
24 hoursafter mating (a), or by virgin females in the daysafter eclosion (b). 

¢, Confocal image (top) and electron microscopy reconstruction (bottom) of 
‘SAG neurons. d, Confocal images (top) and electron microscopy 
reconstructions (middle) of distinct pCl subtypes, including the number of 
SAG-to-pClsynapses detected in the electron microscopy volume (asterisks 
indicate pC1 cells that were only partly traced). Bottom, example tracesof the 
changesin membrane potential in pCl cells after photoactivation of SAG 
neurons (Isat 625nm;red line) before (-Mec) and after (+Mec) mecamylamine 
application. Darker traces were averaged from lighter ones. e, Schematicof 
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experimental design. f, Peak response in each pCl subtype (pCla-pCle; 
labelled asa-ein the figure) after photoactivation of SAG neurons, beforeand 
aftermecamylamine application. g, h, Number of eggs laid by virgin femalesin 
the daysafter eclosion (g), or by mated females in the 24 hours after mating 
(h).4, Schematicof experimental design (top),and example traces (bottom) 
showingthe changesin membrane potentialinan oviDNb cell after 
photoactivation of pC1 neurons before (-PTX) and after (+PTX) picrotoxin 
application. j, Maximum changes in oviDNa and oviDNb membrane potential in 
response to photoactivation of pClcells.***P<0.001, "*P<0.01,*P<0.05by 
Wilcoxontest. Scatter plotsshowmean:s.e.m.(a,b, f-h,j). 
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Fig. 4| oviDNsintegrate mating statusand substrate signals through 
distinct upstream pathways. a, Upstreamneuronsofthree oviDNsidentified 
byelectronmicroscopy reconstruction, showing the number of oviDN input 
synapses. b, Synaptic connectivity amongst four cell typesinthe right 
hemisphere. ¢, Electronmicroscopy reconstructions (top) and confocalimages 
(bottom) of oviENs and oviINs. d, ¢, Example tracesand plots of thechangesin 
‘membrane potential in oviDNsthat were evoked by photoactivating (Isat 
625m) oviENs (d) or ovilNs (e), before and after application of mecamylamine 
orpicrotoxin. f, Changesin the fluorescence signal of the calcium sensor 
GCaMP6sin ovilNsin response to photoactivation of pCI neurons.g-j, Number 


PCI cells (Extended Data Fig. 1). Ablation of pCl cells using either 
driver resulted in an increase in egg laying in virgin females that was 
dependenton oviDN function (Fig. 3g), whereas mated females in which 
pClneurons were chronically activated laid fewer eggs (Fig. 3h). Brief 
optogenetic silencing of pCl neuronsin virgins did notacutely trigger 
egg laying, as would be expected if pCL-inactivated virgins (like pCl- 
intact mated females”) rely on additional substrate-borne cues for the 
induction of egg laying (Extended Data Fig. 7c, d). 

These behavioural data indicate that—similar to SPSNs and SAG 
neurons—pCl neurons suppress the function of oviDNs and therefore 
suppress egg laying in virgin females. Consistent with this interpre- 
tation, we found by in vivo imaging that basal calcium levels in pC1 
neurons, although variable, are generally higher in virgin than mated 
females (Extended Data Fig. 7e). Moreover, whole-cell recordings from 
oviDNs revealed thatboth oviDNa and oviDNb cells are hyperpolarized 
after photoactivation of pCl neurons (Fig. 3, j, Extended Data Fig. 5d), 
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of eggs laid by mated females nthe 24 hours after mating (g,).or by virgin 
femalesin 5 days (h) or 8days (i) after eclosion. k, Schematicof the invivo 
calcium-imagingexperiment. The brain of afemale with partially removed 
head cuticle (green oval) is imagedassubstratesare presented tothelegs 
sequentially usingan elevator platform. I-n, Changesin GCaMP6s signal in 
ovIDNs (1), oviENs (m) or ovilNs (n) in virgin or mated females. 0, Model forthe 
coordination of mating and egglaying, Solid linesindicate monosynaptic 
connections. **P<0.001,**P<0.01, *P<0.05 by Wilcoxontest;NS, not 
significant. Scatter plots show mean +s.e.m.(d-j,-n). 


and that this effectis sensitive to picrotoxin, achloride channel blocker 
(Fig. 3, j). Thisinhibitionis probably indirect, because pC neuronsare 
cholinergic (Extended Data Fig. 3) and have very few synapses onto the 
oviDNs (Extended Data Table 1). 

Tolook for inhibitory intermediates from pC1 to oviDN cells—as well 
as excitatory inputs that might stimulate egg laying upon detection of 
apreferred substrate-we reconstructed the synapticinputsto oviDNa 
and oviDNb cells in the FAFB volume (Fig. 4a, b, Extended Data Table 2). 
We obtained sparse split-GAL4 driver lines for the two cell types with 
the largest numbers of oviDN input synapses (Fig. 4c, Extended Data 
Fig. 1, Supplementary Video). Whole-cell recordings reliably showed 
changes in membrane potential in oviDNs after photoactivation of 
either of these two cell types (Fig. 4d, e). The cell type with the most 
oviDN inputsynapsesis cholinergic (Extended Data Fig. 3), and activa- 
tion of these cells depolarized oviDNs (Fig. 4d). We therefore named 
these cells oviposition excitatory neurons (oviENs). The cell type with 


the second-highest number of oviDN input synapses is GABAergic 
(Extended Data Fig. 3), and activation of these cells hyperpolarized 
oviDNs (Fig. 4e). Accordingly, wenamed these cells oviposition inhibi- 
tory neurons (ovilNs). There is a single oviEN anda single ovilN per 
hemisphere, and they are reciprocally connected (Fig. 4a-c, Extended 
Data Table 1). The ovilNsare also reciprocally connected with pC1 cells 
(Fig. 4b, Extended Data Table 1), and calcium-imaging experiments 
showed that photoactivation of pC1 cells elicitsan excitatory response 
inovilNs (Fig. 4f). The pCI cells have few direct synaptic connections 
with oviNs, andwe_did not detect any connections between SAG neu- 
rons and either ovilNs or oviENs (Extended Data Table 1). 

Silencing oviENs in mated females strongly suppressed egg laying 
(Fig. 4g), similarly to the effect observed when oviDNswere silenced 
(Fig. 1e). By contrast, potentiating oviENs in virgin females caused them 
tolay significantly more eggs than control virgins (Fig. 4h)-albeit notas 
many as mated females (presumably because ovulation remains nfre- 
quent). Manipulating ovilNactivity had the opposite effects: silencing 
ovilNs caused virgins to lay significantly more eggs (Fig. 4i), whereas 
depolarizing ovilNs reduced the number of eggs laid by mated females 
(Fig. 4j). Thus, as expected from the sign of their inputs to oviDNs (that 
is, excitatory for oviENs; inhibitory for ovilNs), oviENs promote egg 
laying, whereas ovilNs inhibitit. 

We hypothesized that oviENs could mediate the external sensory 
signals that trigger egg laying in mated females, which are likely to 
include both gustatory and mechanosensory cues from thesubstrate”. 
When provided with a choice of substrates, females lay more eggs on 
agarose medium than ona hard surface or a substrate of agarose and 
sucrose” (Extended Data Fig. 8a~c). We therefore performed in vivo 
calcium imaging to determine the responses of oviDNs, oviENs and 
ovilNs to the presentation of each of these substrates to the legs 
(Fig. 4k, Extended DataFig. 84, e).In oviDNs, we observed an increase 
incalciumlevels only upon contact with the agarose substrate (Fig. 4). 
This response was stronger in mated females than in virgins (Fig. 41). 
The agarose-and-sucrose substrate elicited a small reduction in cal- 
cium levels, which was more pronounced in virgin females (Fig. 41). The 
oviNs showed a positive calcium response to agarose but to neither 
of the other two substrates, and this response was indistinguishable 
between virgins and mated females (Fig. 4m). The ovilNs responded to 
all three substrates, but more strongly to agarose and sucrose than to 
agarose alone, and only weakly tothe hard surface (Fig. 4n). Regardless 
of substrate, ovilN responses were stronger in virgins than in mated 
females (Fig. 4n). 

In conclusion, our findingssupport the following model for the neu- 
ral coordination of mating and egg layingin Drosophila (Fig. 40). The 
oviDNs control the entire oviposition motor programme. They receive 
excitatory input from oviENs, which respond to stimulatory cues from 
the substrate, and inhibitory input from ovilNs, which convey informa- 
tion aboutmating status from pCl cells. In virgins, increased activity of 
pCI neurons potentiates ovilN-mediated inhibition of both oviDNsand 
oviENs, which suppresses egg laying. After mating, sex peptidesilences 
SAG inputs onto pC1 neurons, thereby decreasing the activity of pCl 
neuronsand ovilNs to facilitate egg laying when apreferred substrateis, 
encountered. Reciprocal connections between ovilNsand oviENsmight 
ensure that oviDNs respond to oviEN activation with the appropriate 
temporal pattern and dynamic range, through feed-forward andfeed- 
back inhibition, respectively. The oviDNs, oviENs and ovilNs all have 
numerous synapticinputs in addition to those thatwe have described 
here-all of which remain functionally uncharacterized. These inputs 
may mediate other controls on the egg-laying process, such as the pres 
ence of aneggin the uterus”*and the nutritional state of the female”. 
The pC neurons might also regulate other female behaviours that 
switch after mating, perhaps through different sets of output neurons. 
Notably, the male counterparts of pCl neurons are thought to encode 


an analogous state of courtship arousal that modulates command 
pathways for specific motor actions such as courtship song™** and 
‘ticking". Thus, functionally analogous but anatomically divergent 
circuits~shaped during development by fru and dsx—could account 
for the distinct reproductive behaviours of Drosophila males and 
females. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Flies 

Flies were reared on standard cornmeal-agar-molasses medium or 
protein-enriched food” at 25 °C with relative humidity of around 50% 
and a12h/12 h light/dark cycle, unless otherwise noted. Fly stocksused 
inthis study are described and listed in Supplementary Tables1and2. 


GAL4 screen for neurons that regulate oviposition 
Wesearchedimage collections of generation’ GAL4 driver lines” for 
those that potentially labelled subsets of fru’ or dsx’ neurons. Several 
hundred selected lines were then screened by examining light-evoked 
behavioural changesin femalesusing UAS-CsChrimson”. Several GAL4 
lines were found to evoke oviposition, andall labelled, amongst various 
other cells, a common set of descendingneurons thatwere present in 
females but not males. We then sought to obtain specific split-GAL4 
driver lines for these descending neurons. 


Split-GAL4 screeningand stabilization 

Split-GAL4 lines used in this study have p6SADZp and ZpGAL4DBD 
inserted at the ateP40 site and attP2 site, respectively"™”, except for 
PCI-SS2, which has p6SADZp inserted at the attP2siteand ZpGAL4DBD 
inserted at the first coding exon of dx”. p6SADZp and ZpGAL4DBD 
lines labelling neurons of interestwere identified using a colour depth 
MIP (maximum intensity projection) mask search”. The expression of 
selected combinations of p6SADZp and ZpGAL4DBD was then exam- 
ined with a UAS reporter (20xUAS-CsChrimson-mVenus in attP18) by 
immunofluorescence staining and confocal microscopy (https://www. 
janelia.org/project-team/flylight/protocols). Finally, the combinations 
of p6SADZp and ZpGAL4DBD that gave the most specific expression 
patterns were stabilized by putting the two hemi-drivers in the same 
flies, and SS (denoting stable split-GAL4) numbers were assigned. 
Images of split-GAL4 lines used in this study can be viewed at http:// 
splitgal4.janelia.org/cgi-bin/splitgal4.cgi. 


Stochasticunsilencing 

An FRT-MCS-tdTomato-FRT fragment was chemically synthesized 
(GenScript)and insertedinto the p/FRC7 20XUAS-IVS-mCD8::GFP back- 
bone. The Kir2.1 coding sequence was synthesized (Integrated DNA 
Technologies) and subcloned into 20XUAS-IVS-FRT-MCS-tdTomato- 
FRT-mCD8:;GFP to create the in-frame td Tomato fusion. The result- 
ing plasmid was inserted by phi-C31-mediated transgenesis into the 
VK00005 landingsites (GenetiVision). Females carrying hs-FipL2:PEST 
in ateP3, UAS-FRT-Kir2.1::td Tomato-FRT-mCDS8.:GFP in VKOOOOS and 
oviDN-SS1 or oviDN-SS2 were heat-shocked during the first-instar larval 
stage or prepupal stage, with 2-4 1-hour incubations at 37 °C sepa- 
rated by I-hour intervals at22°C. Virgin females were collected shortly 
after eclosion and kept in groups of 10-20 females on standard corn- 
meal-agar-molasses medium before being mated with wild-typemales 
4 dayslater. Females that successfully mated were then keptindividually 
and their eggs were counted for 5 consecutive days. Females were then 
killed and examined for Kir2.1::td Tomato or mCD8::GFP expression in 
oviDNs by immunofluorescence staining and confocal microscopy. 


Neuron tracingin the FAFB 

Neuron skeletons ina serial section transmission electron microscopy 
volume of the adult female Drosophila brain" were manually traced 
using the annotation software CATMAID® (http://www.catmaid.org).. 
Neuroanatomical landmarksin the electron microscopy volumesuch 
as fibre tracts, cell body size and position and neuropil boundaries 


were used to search for potential candidates of the oviDNs, SAG and 
pCl neurons. The process of finding relevant neurons was consistent 
for these cell types, relying on distinguishing features such as cell body 
position andtract orientation, and overall dendritic projection patterns 
in the confocal images. We then searched for corresponding areas of 
cell body position in the electron microscopy volume and followed 
the primary neurite emerging from the cell body as it formed fibre 
bundles and traversed the brain in an orientation that matched the 
datain the confocalimages. Just enough ofthe primary and secondary 
neurites (backbone) of each potential candidate was traced to compare 
with confocal data, and neurons that lacked prominent morphologi- 
cal features in the electron microscopy volume were eliminated from 
consideration. Three oviDNs, five pCl neurons and one SAG neuron 
were found ineach hemisphere. The morphologies of oviDNsand pCL 
neurons varied slightly within each group. One oviDNa, one oviDNband 
the pCla, pClcand pCleneurons onthe right hemisphere were traced 
to completion. Synapses were marked on these neurons using previ- 
ously described criteria for achemical synapse®. Inbrief, weannotated 
instancesin which the oviDNs, SAGand pCI neurons were presynaptic 
and postsynaptic. Presynaptic locations were identified by the presence 
ofa T-barat an active zone with vesicles, and postsynaptic sites by the 
presence of postsynaptic densities (PSDs) across a synaptic cleft. At 
presynaptic locations in the oviDNa, SAG and pCI neurons we identi- 
fied postsynaptic neurons that contained PSDs and marked these as 
downstream partners; and at sites at which PSDs were present in the 
oviDNa, SAG and pCI neurons, we identified locations of T-bars in the 
presynaptic neurons and marked these as upstream partners. Only 
upstream partners of oviDNb were identified and marked. 

One ovilN and one oviEN were identified as upstream partnersina 
FAFBwith the most connections to oviDNs. We traced both neurons just 
enough to confirm their identity. We then traced their arbors within 
the superior medial protocerebrum (SMP) neuropil to completion as 
there was extensive overlap with the oviDNs in this neuropil. Within 
the SMP, we identified and markedall synapses betweenall neurons of 
interest (ovilN, oviEN, three oviDNs and five pCl neurons). 


Electrophysiology 

For ex vivo patch recordings, flies aged 3-5 d were immobilized on 
ice for around 30 s, and the nervous system was dissected out in 
extracellular solution”? (ECS) containing 103 mM NaCl, 3 mM KCI, 
5 mMMTris(hydroxymethyl)-methyl-2-aminoethane-sulfonic acid, 
10mMtrehalose, 10 mM glucose, 2mM sucrose, 26 mM NaHCO,,1mM 
NaH,PO,,1.5mM CaCl, and4mM MgCl, (pH7.1-7.3 when bubbled with 
95% (v/v) 0,/5% (v/v) CO,, around 290 mOsm). The pia and glia sheath 
over the somata of interest were carefully removed with fine forceps 
(Domont #5SF, Fine Science Tools). The explant was subsequently 
mounted ona poly-p-lysine (Thermo Fisher Scientific)-coated coverslip 
with the somata of the target neurons facing up, and then transferred 
toan upright Nikon Eclipse FNI microscope equipped with a40x/0.8 
water-immersion objective (CFIAPO NIR, WD=35 mm, Nikon). Aglass 
micropipette with resistance of 10-15 MQ (BIS0-86-7.5, Sutter Instru- 
ment) was prepared on ahorizontal puller (P-1000, Sutter Instrument), 
and filled with intracellular solution containing 140 mM K-gluconate, 
10 mM HEPES, ImM KCI, 4 mM MgaTP, 0.5 mM Na,GTP, 1mM EGTA 
and 1%neurobiotin (SP-1120, Vector Laboratories) (pH near 7.3, around 
285mOsm). Cytoplasmic GCaMP6s” was expressed in the neurons of 
interest and visualized under 470-nm illumination to guide placement 
of the electrode. After obtaininga whole-cell patch, the datawerecol- 
lected witha Multiclamp700B amplifier (Molecular Devices), low-pass- 
filtered at2 kHz andacquired at 10 kHz with aDigidata 1440A digitizer 
(Molecular Devices), and analysed offline in MATLAB (MathWorks). 
Asmall hyperpolarizing current (less than 10 pA) was injected to 
hold the membrane potential around ~65 mV. For Chrimson activa- 
tion, a 625-nm-fibre-coupled LED (M62SF1, Thorlabs) was placed 
around 5 mm away from and pointed to either the brain or the ventral 


nerve cord (power intensity is about 2 mW mm). Light stimulations 
were controlled by using Clampex (Molecular Devices) via the 
digitizer. For blocking nicotinic acetylcholine receptors or chloride 
channels, samples were bathed in ECS containing mecamylamine 
(10 UM, M9020, Sigma-Aldrich) or picrotoxin (150 UM, P1675, 
Sigma-Aldrich), respectively, for 15 min to allow the action of the 
antagonists. 


Calcium imaging 

Calcium imaging was performed at 21 °C on acustomized two-photon 
microscope equipped with a12-kHz resonant scanner (CRS 12 KHz, 
Cambridge Technology), piezo objective scanner (P-725K129, Physik 
Instrumente) with a controller (E-709, Physik Instrumente) andan Apo 
LWD 25x/L.1water-immersion objective (Nikon). Z-stacks of 40 frames 
of either 512 * 512 pixels or 600 x 512 pixels were taken at 0.99 Hz to 
cover a volume of the sample, and GCaMP6s signal was captured by 
a photomultiplier tube (Hamamatsu Photonics) under the illumina- 
tion of a two-photon laser (Chameleon, Coherent) tuned to 920 nm. 
The software Scanimage (Vidrio Technologies) was used to control 
image acquisition and synchronize stimulations. Each imaging session 
produced 260 volumes and lasted around 260 s. The samples were 
continuously perfused with ECS. 

For ex vivo imaging, flies aged 4-6 days were immobilized on ice 
for around 30 s. The nervous system was then dissected out in ECS 
and mounted on a poly-p-lysine-coated coverslip. After being placed 
under the objective, a 625-nm-fibre-coupled LED (M625F1, Thorlabs) 
was placed around 5 mm away from and pointed to the brain (power 
intensity is around 2 mW mm) to provide light stimulations. 

Forin vivo imaging, flies aged 4-6 days wereimmobilized onice for 
about 5 min and then inserted into a rectangular hole (18mm x 1mm) 
ona thin plastic sheet that was the bottom of a customized imaging 
chamber. The orientation of the fly's head was adjusted so that the 
antennae were beneath the plastic sheet, and the posterior head cuticle 
was facing above. Small amounts of ultraviolet (UV) curing adhesive 
(Loctite 352, Henkel) were applied at gaps between the fly and the 
hole to fix the fly in position, with a brief (around 10-s) UV irradiation 
(CS2010, Thorlabs). The six legs and abdomen of the fly could move 
freely. After filling the chamber with ECS, an observation window was 
opened on the head cuticle over the posterior part of the brain, and 
fat tissue as well as the trachea covering the posterior brain was gently 
removed with forceps. The oesophagus and muscles 1 and 16 were cut, 
to minimize the movement of samples. A small plastic stage (around 
Icmin diameter) was placed ataround 5 mm underneath thefly, anda 
manipulator (MP-285, Sutter Instrument) was used to elevate the stage 
tolet the legs of the fly touch and stand on thestage fora period of time. 
The stage was covered with 1% plain agarose, 1% agarose containing 
150 mM sucrose, or nothing. Each touch lasted approximately 10 s 
(10 volumes) with intervals of around 30 s (30 volumes). 

Analysis of calcium-imaging data was done offline in Fiji and 
MATLAB. In brief, z-stacks from each imaging session were averaged 
acrossall 260 time pointsto get az-stack witha higher signal-to-noise 
ratio, which was used asa reference for identification of the neurons 
or neurites of interest. Then, slices covering theneurons or neurites of 
interest were averaged at each time point to get atime series of projec- 
tion images. Sample movements during the imaging session werecor- 
rected by using TurboReg”in Fiji. Regions of interest (ROIs) werethen 
selected by drawing polygons on the corrected time series. For each 
ROL, the time course of the GCaMP6s signal was obtained by averaging 
the fluorescence intensity of every pixel inside that ROI at each time 
point. The averaged fluorescence values over 10 time points before 
and after the onset of each stimulus were used as the baseline (F,) 
and response (F), respectively. The absolute change in fluorescence 
intensity (AF) was calculated by subtracting F, from F, and the fluores- 
cence-intensity changes related to baseline (AF/F,) in each ROI were 
obtained. 


Immunofluorescence staining 

Most of the immunofluorescence staining was performed by follow 
ing the standard protocols described previously. Detailed protocols 
for double-label staining, polarity staining and stochastic labelling in 
multiplecolours are availableat https://www.janelia.org/project-team/ 
flylight/protocols. For determining the cell types that were labelled by 
a particular split-Gal4 driver, polarity staining was used to count the 
total number of cells, and stochastic labelling in multiple colours was 
performed toreveal the morphology of individual cells. Fluorescence 
in situ hybridization was performed as described previously”. 

For staining of Kir::td Tomato and mCD8::GFP, the central nervous 
system was prepared in ECS and fixed in 4% paraformaldehyde (PFA; 
sc-821692, Santa Cruz) at22 °C for15 min, After being washed in phos- 
phate-buffered saline containing 0.5% (v/v) Triton X-100 (PBT) for 30 
min at 22°C, the sample was incubated in blocking buffer (50062Z, 
Thermo Fisher Scientific) containing primary antibodies including 
rabbit anti-dsRed (1:500, 632496, Takara Bio), chicken anti-GFP (1:500, 
‘10262, Thermo Fisher Scientific) and mouse anti-Bruchpilot (nc82, 
1:25, DSHB) for 24-48h at 4 °C. Thesample wasthen washed in PBT for 
2hoursbeforebeingincubated in blocking buffer containingsecondary 
antibodies: AFS46-conjugated goat-anti-rabbit (1:300, A11035, Thermo 
Fisher Scientific), AF488-conjugated goat-anti-chicken (1:300, 432931, 
Thermo Fisher Scientific) and AF647-conjugated goat-anti-mouse 
(1:300, A21235, Thermo Fisher Scientific) at 4 °C for 24 h. After being 
washed in PBT for 30 min at 22°C, the sample was dehydrated and 
mounted onaslide. 

For staining of neurobiotin loaded into neurons during whole-cell 
recording, the central nervous system was dissected out and pro 
cessedas described above, exceptthat AF647-conjugated streptavidin 
(1:500, $21374, Thermo Fisher Scientific) wasincludedin the primary 
and secondary antibodies, and AF405-conjugated goat-anti-mouse 
(1:300, A31553, Thermo Fisher Scientific) was used instead of AF647- 
conjugated goat-anti-mouse antibody. 


Dehydration and DPX mounting 

Afterincubationwith secondary antibodies, the sample was washed in 
PBT for 15min, fixed in 4%PFA for10 min and sequentially dehydrated 
for Sminin30%, 50% and 75% ethanol. Thesample was mounted ontoa 
poly-p-lysine coated coverslip in 75% ethanol, and further dehydrated 
in100% ethanol for 10 min. The coverslip was then submerged in xylene 
(XS, Thermo Fisher Scientific) for Smin, before being mountedtoadrop 
of DPX mountant (50-980-370, Thermo Fisher Scientific) onaslide. The 
slide was left to dry for 24 h before performing confocal microscopy. 


Confocal microscopy and image analysis 

Confocal imaging was performed under an LSM 800 or an LSM 880 
inverted confocal microscope (ZEISS), with aPlan-Apochromat20*/0.8 
M27 objective or a Plan-Apochromat 63x/1.4 oil-immersion objective 
(ZEISS). Images were captured using ZEN software (ZEISS), and later 
analysed using Fiji* and VVDViewer (https://github.com/takashi310/ 
WVD_Viewer). 


Behavioural assays and analy: 
The flies used in behavioural assays were sorted and collected under 
light CO, anaesthesia 1-6 h after eclosion. Virgin femaleswere keptin 
groups of 3-10 fliesin vials, and males were singly housed insmall food 
chambers (7 mm x7 mm x35 mm). Flies used in optogenetic assays were 
reared on food containing 0.2 mMall-trans-retinal (Sigma-Aldrich) in 
darkness, before and after eclosion. 

For assessing egg laying by virgin females, 4-5 flies were grouped 
on standard cornmeal-agar-molasses medium in single vials. 
The flies were transferred tonew vials containing fresh food every 24h 
(at around zeitgeber time (ZT) 2), and thenumber of eggs laid in each 
vial was manually counted under a stereo microscope. For assessing 
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egg laying by mated females, virgin females were first mated with 
wild-type malesin courtship chambers (diameter10mm, height 2mm), 
and subsequently kept individually in vials containing protein-enriched 
medium, or, forthe stochasticunsilencing experiments, standard corn- 
meal-agar-molasses medium. The number of laid eggs was counted 
as described above every 24h. 

For the experiment in which the position of eggsin the reproductive 
organs was determined, female flies were flash-frozen in liquid nitrogen 
and subsequently dissected in ECS undera stereo microscope. The 
reproductive organs were carefully uncovered by removingthe cuticle 
over the ventral abdomen, and the presence of an egg in the uterus, 
oviducts and ovaries was assessed. 

For examination of natural oviposition, 30 virgin females aged 
4-6 dand 35 wild-type males aged 3-5 d were grouped in a food vial 
containing wet yeast paste, which boosts egg production while pre- 
venting the females from laying lots of eggs. After 4-5 d, single females 
were transferred by gentle aspiration into an observation chamber 
(10 mm x30 mm x10 mm, with a5mm x10 mm grooveat the centre). 
Asmall amount of cornmeal-agar-molasses medium was placed in 
the central groove asan egg-laying substrate. The chamber waskeptin 
darkness with infrared illumination (880 nm) from below. The behav- 
iour of the female around the food was videotaped from the side at a 
rate of 30 frames per second (fps) for 20 min. 

For optogenetic activation, females werekept in darkness before 
being transferred into the observation chamber (diameter 10 mm or 
18mm, height 2mm) by gentle aspiration. A customized LED panel 
capable of emittinginfrared (880 nm) and red light (635 nm) was placed 
beneath the chamber to provide uniform backlight for the camera 
as well as red-light stimulations. The intensity and temporal pattern 
of light were controlled by using a customized program written in 
MATLAB. A camera (Manta-12SC, Allied Vision) was placed above the 
chamber to video the behaviour of flies at 30 fps. The infrared-cut 
filter that came with the camera was removed to allow the detection 
ofinfrared light. 

Forhigh-speed videotaping with optogenetic activation, wemodified 
a previously described set-up™. In brief, individual females climbed 
upwards through a tunnel to a rectangular platform (4mm x 2mm) 
that was surrounded by a groove filled with water. The platform was 
illuminated by infrared light (850 nm) and was focused by LEDs pro- 
viding light stimulations (5 s of continuous 625-nm illumination of 
200 hW mm”). The behaviour of the fly on the platform was videoed 
(Ace, Basler) from the side ata rate of 200 fps. 

To analyse the actions performed by female flies during natural or 
light-induced oviposition behaviour, videos were manually analysed 
offline. Threeactions~abdomen bending, ovipositor extrusion and egg, 
deposition-were analysed. Abdomen bending was defined as frames 
inwhichtheabdomen was bent such thata lineconnectingthe haltere 
and the abdominal tip came to meet at an angle of 15° or larger to the 
thoracic midline. Ovipositor extrusion was defined as any frame in 
which the ovipositor of the female was extruded. Egg deposition was 
defined as frames in which an egg was laid on the substrate. 

For receptivity assays, one virgin femaleaged 3-6 dand one wild-type 
male aged 3-5 d were transferred intoa courtship chamber (diameter 
10 mm, height 2mm) by gentle aspiration, and videoed under white- 
light illumination for a period of 30 min. The copulation rate was 
checked every2min. 

For egg-laying preference assays, we adapted a set-up developed 
previously”. In brief, 30 virgin females aged 4-6 d and 35 wild-type 
males aged 3-5 dwere grouped ina food vial containing wet yeast paste, 
which boosts egg production but limits egg deposition. After 4-5 d, 


single females were gently aspirated into the observation chamber, 
which contains two 1% agarose grooves, one with and one without 
sucrose. The female's behaviour was videoed for I2hand the eggs laid 
were counted. The preference index was calculated as the difference 
between egg numbers on two grooves divided by the total number 
of eggs. 


Statistics 

Allegg-laying and electrophysiology data were analysed by unpaired 
Wilcoxon signed-rank test. Egg-position data were analysed by Fish- 
er’sexact test. All of the statistical analyses were performed using R 
software or MATLAB. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 
The datasets generated during the current study are available from 
the corresponding author on reasonable request. 
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Extended Data Fig. 1|Split-GAL4 driverlinestargetingoviDNs,SPSNs,SAG —_ andwithanti-FLAGoranti-GFPto reveal the membranes of targeted neurons 
neurons, pClneurons, oviENs and ovilNs. Confocal images of the central (green). Scale bar, 100 um. Both oviDN-SSIand oviDN-SS2label.asingle oviDNa 
nervous system from femaleand male lies carrying the indicated split-GAL4 anda single oviDNb cell in each hemisphere; oviDN-SS2also weakly labels an 


driver lines as well as UAS-myrFLAG or UAS-CsChrimson-mVenus.Sampleswere unrelated cell (pMPI) that is present in both sexes. 
stained with the monoclonal antibody ne82 to reveal all synapses (magenta), 
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Extended Data Fig.2| Expression of fruand dsx in oviDNsand pC1 neurons. Confocalimages of female brains showing the co-labelling of oviDN-SSlines with 
fru-LexA but not dse-LexA, and of the pCI-SS/line with dsx-LexA but not/ru-LexA. Scale bars, 20 ym. 
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Extended Data Fig.3| Neurotransmitter typesrevealed by fluorescence insitu hybridization. Confocalimages showing the expression of GADI, ChATand 
uGluTinoviDNs, SAG neurons, pCI neurons, oviENs and ovilNs in female brains, Red arrows indicate cell bodies of interest. Scale bars,20 um. 
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Extended Data Fig. 5| Stochasticlabelling and unsilencing of oviDNs. 
a, Images of two femalesamplesin whicha single oviDNa or oviDNb cellis 
labelled, asshown in Fig. 1b, Arrowheads indicatebranches thatare present in 
oviDNb (solid) but absent in oviDNa (open). The branch that islabelled by 
arrowhead 1 was primarily used to distinguish oviDNa from oviDNb. b, Example 
imagesof brainsin which oviDNs were either silenced (red; Kir2.1:tdTomato) or 
unsilenced (green; mCD8::GFF). The number of unsilencedoviDNsin each 
sample is shown, Greenarrowheads indicate distinctive branches of oviDNb. 


ovDNGIN 


Brains were counterstained withnc82 (blue). Scale bar, 100 um.c, Number of 
eggs laid in the five days after mating by mated females with different oviDNs 
unsilenced.***P<0.001,**P< 0.01 by Wilcoxon test. Scatter plots show 

mean #s.e.m.d, Confocal images of two samples inwhichasingle oviDN was 
loaded with neurobiotin during whole-cell recording. The samples were 
stained with streptavidin (to reveal the recorded cell, yellow) and ne82(blue). 
Arrowheads indicate oviDNb-specific branches. Scale bars, 100 im. 
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Extended Data Fig. 6 Sequence of ovipositionactions after oviDN 
stimulation. Example ethograms showing the onsets of ovipositionactions in 
mated femalesafter photoactivation (3s) of oviDNsat varying light intensities. 
Each row representsasingle female, 
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Extended Data Fig. 7 | Anatomical and functional characterization of pC1 branches that were used for subtype identificationas in a. c,d, Number of eggs 
neurons.a. Confocal images of single pCIneuronsinthefemalebrain,as laid by virgin females during a one-hour (c) or three-day (d) period inwhich 
shownin Fig.3d. Arrowheads indicate the presence (solid) or absence (open) of _ either SAGor pCl neurons were optogenetically silenced. e, Basal GCaMP6s 
subtype-specific branches. b, Confocalimages of neurobiotin-filled pCL signalsin pClcell bodiesin virgin and mated females.***P<0.001 by Wilcoxon 


neurons from which whole-cell patchrecordings were obtained, indicatingthe _test;scatter plotsshow mean +s.e.m.(d,e). 
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Extended Data Fig. 8 |Egg-laying substrate preferencesand substrate- 
evoked calcium responses in oviDNs, oviENsand oviINs. a, Image of the egg: 
laying chambers in each of which an individual mated female had laid numerous 
eggs. Chambers with plain agarose (blue box), agarose containing 150 mM 
sucrose (red box) and plastic surface (green box) are indicated. b, Total number 
of eggslaid by individual mated females ina12-hobservation period."*P<0.01, 
*""P<0.001 by Wilcoxon test. c, Preference indices showing the preference of 
female flies for laying eggs on different substrates, Preference index (PI) is 


calculated as (number of eggs on plain agarose - number of eggs on other 
substrate)/total number of eggs. Dataaremean-+s.e.m.d, Projected images of 
ovIDNs (Cop), ovIENs (middle) and ovilNs (bottom) expressing GCaMP6s, 
showingROIs for quantification. e, Example AF/F,traces for each ROI upon 
presentation of the indicated substrates, in virgin (left) and mated (right) 
females. Horizontal barsindicate presentation ofthe substrate. Darker traces 
areaveraged from six trials (lighter traces) 


Extended Data Table '1 | Synaptic connections identified by electron microscopy reconstruction 


Post 

Pre Cell 1D SAG_R_ SAG L pCia pCib pCic pCid pCte covilN oviEN _oviDNa__oviDNa_oviDNb_ 
‘SAG_R® 5353954 0 0 173 7 78 0 2 0 0 0 0 1 
SAG_L* 4358525 0 0 79 2 19 0 0 0 0 0 oO 0 
pCta* 3807213 4 1 3 40 85 20 7 0 0 0 oO 1 
pCib? 3781622 0 0 0 0 6 0 0 0 0 0 oO 0 
poict —-3794ie4 =O 0 5 8 ° to 2 ° 1 0 ° 
poid' = 3778246 =O ° 2 0 5 0 10 20 ° 1 o ° 
pCtet 12609690 ° 1 3 4 5 4 31 0 0 0 o 
‘ovilN® 6244095 0 0 0 i) 75 0 78 0 76 20 35 29 
oviEN* 1259227 0 ty) 2 0 0 0 6 136 i) 118 42 142 
oviDNa® 5143347 0 0 0 0 1 0 1 0 0 0 oO 0 
oviDNa® 1875105 0 0 0 0 oO 0 oO 0 0 0 1 0 
oviDNb* 1862763 0 o 0 oO o 0 0 0 0 oO oO 0 


Fully raced cells; ‘partially raced cells. SAG Rand SAG indicate right-and left hemisphere SAG cals, respectively; all other neurons are right-hemisphere cel, 
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Extended Data Table 2 | oviDN inputs identified by electron microscopy reconstruction 


oviDNa oviDNa oviDNb 
CoD Hemalepbere Ersanitsiad (5143347) (1875105) (1862763) oul 
1259227 R ‘oviEN 118 42 142 302 
6244095, R ovilN 20 35 29 84 
3353966 R 3t 1 27 59 
4634382 R 3 38 16 57 
2361058 R 19 10 15 44 
1934539 L 21 2 13 36 
5870279 L ovilN 6 10 ab) 27 
5390561 R 8 0 16 24 
4590002 ND. 16 4 3 23 
1879478 R tC) 21 1 22 
2141316 R 4 W i 22 
11122221 R 13 1 8 22 
2360875 R 1 AF ) 18 
8460445 L 13 0 4 17 
2712415, R 2 13 1 16 
3243035 R 7 ° 9 16 
5330678 L 13 0 3 16 
4295394, L 9 oO 6 15 
7054780 NLD. 9 oO 6 15 
(2232454 R ) 14 co) 14 
2613258 R 0 13 1 14 
3188249, R 8 2 4 14 
41576051 R 0 0 13 13 
5470288 NLD. 9 oO 4 13 
3588146, R 5 2 5 12 
4344860 R 5 4 5 1 
5316770 R 10 4. cy) 1 
5325544 R 10 i) 1 ani 
5431073 L 5 0 6 7 
6759088 :: 8 i) 3 bhi 
7021239 L oO W oO W 
7532739 ND. 4 0 7 1" 
2255653, R 6 ) 4 10 
3709065 ND. oO 8 2 10 
9040679 L oviEN oO 8 2 10 


Number of synaptic connections identified between various input neurons and the right-hemisphere oviDNa and oviDNb cells. R and L indicate the soma location inthe right (ipsilateral) or left 
(contralateral) hemisphere; N.D,, soma notidentiied. 
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Proper brain function depends on neurovascular coupling: neural activity rapidly 
increases local blood flow to meet moment-to-moment changes in regional brain 
energy demand’. Neurovascular couplingis the basis for functional brain imaging?, and 
impaired neurovascular couplingis implicated in neurodegeneration’. The underlying 
molecular and cellular mechanisms of neurovascular coupling remain poorly 
understood. The conventional view is that neurons or astrocytes release vasodilatory 
factors that act directly on smooth muscle cells (SMCs) to induce arterial dilation and 
increase local blood flow’. Here, using two-photon microscopy to image neural activity 
and vascular dynamics simultaneously in the barrel cortex of awake mice under 
whisker stimulation, we found that arteriolar endothelial cells (aECs) have an active 
roleinmediating neurovascular coupling. We found that aECs, unlike other vascular 
segments of endothelial cellsin the central nervous system, have abundant caveolae. 
Acute genetic perturbations that eliminated caveolae in aECs, but notin neighbouring 
SMCs, impaired neurovascular coupling. Notably, caveolae function in aECsis 
independent of the endothelial NO synthase (eNOS)-mediated NO pathway. Ablation 
ofboth caveolae and eNOS completely abolished neurovascular coupling, whereas the 


single mutants exhibited partial impairment, revealing that the caveolae-mediated 
pathway in aECsisa major contributor to neurovascular coupling. Our findings 
indicate that vasodilation is largely mediated by endothelial cells that actively relay 
signals from the central nervous system to SMCs viaa caveolae-dependent pathway. 


Despite representing only 2% of body mass, the brain uses 20% of the 
body’s energy at rest and has very limited ability to store energy’. To 
meet acute changes in regional brain energy demand, a process called 
neurovascular coupling rapidly increases local blood flow following 
neural activation’; this is also the basis for functional brain imaging, 
one of the few techniques currently available to image and measure 
activity in the human brain in both health and disease’. 
Neurovascular coupling begins with increased neural activity 
and ends with SMC relaxation leading to arteriolar vasodilation and 
increased capillary blood flow**. This process occurs rapidly, on the 
order of hundreds of milliseconds in vivo under physiological con- 
ditions*”. How signals are transmitted from neurons to SMCs is not 
completely understood, The conventional view hasbeen that following 
neural activity, neurons and astrocytes release vasodilatory signals 
that act directly on SMCs to relax and expand arteriolar diameter to 
increase blood flow. However, recent studies have indicated thatblood 
vessels canalso sense changes in neural activity**”, but the mechanisms 
underlying how these endothelial cells (ECs) inthecentral nervoussys- 
tem (CNS) mediateneurovascular coupling remains largely unknown. 
Here, we demonstrate that CNSaECs actively mediate signals from 
neurons to facilitate the relaxation of SMCs during neurovascular cou- 
pling. We found that unlike other segments of ECs in the CNS vascu- 
lature, aECs contain abundant caveolae. We used in vivo two-photon 


microscopy for simultaneous measurement of neural activity and vas- 
cular dynamics (arteriolar vessel diameterand capillary blood flow) in 
the barrel cortex of awake mice following whisker stimulation. Acute 
geneticperturbations that eliminated caveolaein aECs, butnotin neigh- 
bouring SMCs, impaired neurovascular coupling. Moreover, caveolae 
function inaECsisindependent of the eNOS-mediated NO pathway, and 
ablation of both caveolae and eNOS completely abolished neurovascu- 
lar coupling, revealing that the caveolae-mediated pathway inaECsisa 
majorcontributor to neurovascular coupling. Finally, we demonstrate 
that MFSD2A, amolecular suppressor of caveolae formation, isabsent 
in aECs and that ectopic expression of MFSD2A specifically in aECsis 
sufficient to impair neurovascular coupling. Our findingsindicate that 
ECsactively mediate signals from neurons andastracytesto SMCs viaa 
caveolae-dependent pathway, which is a major mechanism underlying 
neurovascular coupling. 


CNS arteriolar ECs have abundant caveolae 

Ithas previously been shown that caveolae are actively suppressed in 
most CNS ECs to ensure blood-brain barrier integrity” *. However, 
this suppression of caveolae isnot uniform inallsegments of the CNS 
endothelium as aECs have abundant caveolae (Fig. 1b, d), in contrast 
to negligible numbers of caveolae in capillary ECs (cECs) (Fig. 1a, d). 
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Fig.1|CNSarterioles haveabundantcaveolae.a, Transmission electron 
microscopy image ofa CNS capillary. Pseudocolours highlight different cells: 
cEC (purple), pericyte (teal), astrocyte end-foot (blue), redblood cell (RBC, 
red), lumen (L) (white) and neuropil (yellow). Bottom shows aninverted, 
magnified image of the boxedareain the top panel. b, Transmission electron 
microscopy image ofa CNS arteriole, Pseudocolours highlight different cells: 
aEC (purple),SMC (green), astrocyte end-foot (blue) and neuropil (yellow). 
Bottom showsa magnified image of the boxed areain thetop panel. 
Arrowheads point to vesicles (a,b). c, Transmission electron microscopy 
images of aCsand SMCs from Cav and Cavt* mice, Arrowheads pointto 
caveolae. d, Mean vesicular density in cECs and aECs from wild-type mice (n=5 
mice, 46 capillaries and 24 arterioles).e,, Mean vesicular density in aECs(e) 
andSMCs(f) in Cavl"* (n=Smice, 20 arterioles)and Cav mice(n=Smice, 28 
arterioles). Data are mean +s.e.m.;nested, unpaired, two-tailed etest (d-f). 


cECs and aECs can be distinguished under transmission election 
microscopy: capillaries have a smooth lumen whereas arterioles have 
aruffled lumen; cECs are also surrounded by pericytes, whereas aECs 
are sheathed by SMCs (Fig. 1a, b). The abundant vesicles in aECs are 
abolished in caveolin-J-mutant (CavI") mice, suggesting thatthey are 
composed of caveolae (Fig. 1c, e).Caveolin lis an essential component 
of caveolae and the endothelium of Cav mice lacks caveolae"™., 
Thus, CNS aECs have abundant caveolae, consistent with a previous 
study". Notably, many caveolaeare also present in the SMCsthatwrap 
around theaECs (Fig. 1c, f). 


Neurovascular coupling requires caveolae 

Because caveolae are specifically abundantin aECs, and CNSarterioles 
are the site of vasodilation”, we examined CavI mice to determine 
whether caveolae are important for neurovascular coupling. To study 
neurovascular coupling in vivo, we optimized a two-photon micro- 
scope for simultaneous measurement of neural activity and vascular 
dynamics, including arteriolar vessel diameter and capillary blood 
flow at single-vessel resolution in awake mice (Extended Data Fig. 1). 
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Fig.2|Caveolaein CNS aECs specifically are required for neurovascular 
coupling. a, Still frame images of pial arteries during neurovascular couplingin 
Cavt’"and Cav" miceusingin vivo two-photon microscopy. Top, Hydrazide- 
stainedarterioles during baseline and whisker stimulation. White hashes 
outline the arterioles during baseline period. Bottom, kymographs of the 
arteriolar dilation, generated by transverseline scans (orange lines in top 
images). The grey rectangle inthe kymograph representsthe whisker 
stimulation period. b, Kymographs of red blood cell low in capillaries for 
Cav" and Ca mice. Darkstreaks represented blood cells, blue streaks 
represent the fluorescent tracer-filled capillary lumen. Leftand right 
kymographs show red blood cell flow during baseline and whisker stimulation, 
respectively.c-f, Time course of change inarteriolar dilation(c), change inred 
blood cell velocity (d), maximum change in arteriolar dilation (e) and maximum 
red blood cell velocity (f)in Cav" (n=5 mice, 196 arterioles, 77 capillaries) and 
Cav mice (n=5 mice, 194 arterioles, 79 capillaries). g-j, Time course of 
change in arteriolar dilation (g), change in red blood cell velocity (h), maximum 
change in arteriolar dilation (i) and maximum red blood cell velocity (j)in 
control (@MX"""";Cavt'®’,n=7 mice, 260 arterioles, 122capillaries) and aEC 
conditional Cav-knockout mice (BMX""*";Cavl;n=Smice, 193 arterioles, 94 
capillaries). Dataaremean+s.e.m.:nested, unpaired, two-tailed rtest(e,f,i,j)- 


We focused on the barrel cortex, awell-characterized region of mouse 
somatosensory cortex that processes sensory input from the vibris- 
sae", Sensory stimulation by whisker brushing in awake mice evoked 
spatially and temporally patterned neural activity that can beimaged in 
the barrel cortex by intracellular calcium levelsin mice expressing the 
calcium sensor GCaMPé6s in neurons (7hy/-GCaMP6s) (Extended Data 
Fig. Ib, c). Hydrazide* and quantum dots were injected intravenously 
into Thy1-GCaMP6s mice to visualize arterioles and image capillary 
blood flow, respectively. Upon whisker brushing, we observeda robust 
increase in the GCaMP signal in neurons, followed by arteriolar dilation 
and increased red blood cell velocity (measured by tracking the move- 
ment of red blood cells, which are devoid of quantum dots and thus 
appear dark) (Extended Data Fig. 1b-g, Supplementary Videos 1-3). 
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Fig.3|Caveolaein aECsmediate neurovascular coupling independently of 
eNOS. Time course of change in arteriolar dilation (a), change in red bloodcell 
velocity (b), maximum percentage change in arteriolar dilation (c) and 
maximum percentage change inred blood cell velocity (d) in Cav"“Nos3** 
(n=Smice, 148 arterioles, 76 capillaries), Cavl”"Nos3"” (n=Smice, 128 
arterioles, 68 capillaries), Cavl'"Nos3 (n=5 mice, 137 arterioles,73 
capillaries) and Cav Nos3“ mice (n=Smice, 139 arterioles, 74 capillaries). 
Dataare mean+s.e.m.;nested, one-way analysis of variance (ANOVA) witha 
post hoc Bonferroni multiple comparison adjustment c,d). 


Finally, in contrast to the robust vasodilation observed in the barrel 
cortex (using our in vivo whisker stimulation paradigm), the retros- 
plenial cortex~a brain region not associated with processing whisk- 
ing””—exhibits very low levels of vasodilation (Extended DataFig. 1h, i). 
This result indicates that the changes in arterial vessel diameter are 
a result of the whisker-stimulus-dependent neural activity and not 
systemic variables. 

avr” mice exhibited attenuated arteriolar dilation upon whisker 
stimulation, whereas arterioles from wild-type Cav’ and heterozy- 
gous Cav’ mice dilated robustly (Fig. 2a,c,e, Extended Data Fig. 3a, i, 
Supplementary Video 4). Moreover, this vasodilation defect was 
observed in both pial arteries and penetrating arterioles diving deep 
into the parenchyma in Cav" mice compared with their wild-type 
littermates (Extended Data Fig. 2). Notably, the baseline diameter and 
latency to dilate were similar across the three genotypes (Extended 
Data Fig. 3b-d). These results suggest that the absence of caveolae 
does notimpair basal vessel tone and kinetics but specifically impairs 
the amplitude of sensory-evoked arteriolar dilation. Consistent with 
the attenuation of arteriolar vasodilation, capillary blood flow was 
also impaired in mutant Cat“ mice upon whisker stimulation com- 
pared to control mice (Fig. 2b, d, f, Supplementary Video 5), whereas 
baseline capillary velocity and kinetics were similar across genotypes 
(Extended Data Fig. 3e, f). Moreover, the attenuated arteriolar dila- 
tion and capillary blood flow in mutant Cav mice were not due to 
either the impairmentin sensory-evoked neural activity or alteration 
in blood pressure, because control and mutant mice display similar 
GCaMPé6s dynamics in neurons (Extended Data Fig. 3g, h) and similar 
systolic, diastolicand mean blood pressure (Extended Data Fig. 3)). The 
normal blood pressure observed in Cav" mutant mice is consistent 
with previous studies. Thus, these results demonstrate that caveolae 
areessential for optimal neurovascular coupling. 

Because SMCs control arteriolar dilation during neurovascular 
coupling’, we next examined whether the attenuated neurovascular 
coupling in Cav mice is due to impaired integrity and function of 
SMCs. To visualize SMC morphology and vessel coverage, we intra- 
venously injected hydrazide into control Cav’ ;NG2-DsRed’ and 
mutant CavI";NG2-DsRed’ mice. NG2-DsRed is a reporter for SMCs, 


108 | Nature | Vol 579 | 5 March 2020 


oligodendrocytes and pericytes”. By quantifying of the number 
of DsRed’ cells on hydrazide’ arterioles, we established that there is 
normal coverage and morphology of SMCs in Cav mice compared 
with wild type (Extended Data Fig. 4a, b). Moreover, we found similar 
expression of various contractile proteins, including a-smooth muscle 
actin (SMA), MYHIL, transgelin and desminin SMCsin Cavr" mice and 
wild-type littermates (Extended Data Fig. 4¢-g). 

Toexamine whether ablation of caveolae affects the ability of SMCs 
torespond to contractileand vasodilatory signals, weimaged arteriolar 
diameter changesin acute brain slices under two-photon microscopy 
after delivery of contractile and vasodilatory pharmacological com- 
pounds. We found that SMCsin Cav mice displayed normal contrac- 
tion compared with wild-type controls following administration of 
46619, athromboxaneA2 receptor agonist (Extended Data Fig. 4h, i, 
Supplementary Videos 6, 7). When diethylamine (DEA)-NONOate, aNO 
donor, was subsequently applied to thesame vessel, we observeda simi- 
lar level of dilation asin the wild-typecontrols (Extended Data Fig. 4h, j, 
Supplementary Videos 6, 7). Finally, to examine whether the impaired 
neurovascular couplingin mutant Car mice is dueto the inability of 
SMCs torelax following release of vasodilatory signalsin vivo, weused 
two-photon microscopy in anaesthetized mice and imaged changesin 
vasodilation upon superfusing DEA NONOate onto the pia of control 
Cavt’* and mutantCavt mice, Uponacute administration of DEANON- 
Cate, the SMCs from both control Cav" and mutant Cavt™ mice relax 
and dilate arterioles at similar levels in vivo (Extended Data Fig. 4k, |, 
Supplementary Videos 8, 9). These experiments demonstrate that 
the absence of caveolae impairs neurovascular coupling despite the 
presence of functionally normalSMCs. 


Neurovascular coupling requires aEC caveolae 
Caveolaeare presentin both aECsand SMCsinthe CNS (Fig.1c, f). Wenext 
tested whether caveolae function inacell-autonomous manner using 
acute, cell-type-specific deletion of Cav1 inadult mice. First, we crossed 
BMX" mice, a tamoxifen-inducible aEC-specific driver line”, with 
Cav1-floxed mice (CavI")" to acutely ablate caveolae only in aECs. After 
tamoxifen treatment, CAVIwas specifically lostin aECs butnot in SMCs 
of BMX"""; Cav mutant mice; CAV] protein was present in both aECs 
and SMCs in BMX““"" Cav" control mice. (Extended Data Fig. 5a, c). 
Transmission electron microscopy analysis showed that in the barrel 
cortex of mutant BMX" Cav mice, caveolae were ablated acutely 
inaECs but were till presentin SMCs, whereas abundant caveolae were 
present in both aECs and SMCsin the CNS of control BMX" ;Cavl'” 
mice (Extended Data Fig. Sb, d). Using our in vivo imaging paradigm, 
we found attenuation in arteriolar dilation (Fig. 2g, i, Extended Data 
Fig. 6a) and capillary blood flow (Fig. 2h, j) in BMX""";Cavt™ mice on 
whisker stimulation, similar to observations inthe Cav mutantmice 
(Fig. 2a-f). Baseline and kinetics of arteriolar diameter and capillary 
blood flow were unaffected in tamoxifen-treated BMX"*"';CavT™ mice 
(Extended Data Figs. Se, f, 6b-d). Together, these experiments demon: 
strate that caveolae inaECsareimportant for neurovascular coupling. 


SMC caveolae are dispensable 

Totestthe role of SMC caveolae in neurovascular coupling, we crossed 
Myh1r™" mice,a tamoxifen-inducible SMC driver line, to Cavi-floxed 
mice to acutely ablate caveolae in SMCs. After tamoxifen treatment, 
we found that both CAVI protein and caveolae were ablated success- 
fully in SMCs but preserved in aECs in Myhl“";Cavt mutant mice 
(Extended Data Fig. 7a~d). However, in contrast to the attenuated 
neurovascular coupling observed in mice lacking CAVI in aECs, no 
impairment in arteriolar dilation and capillary blood flow after whisker 
stimulation were observed in mutants lacking CAVI and caveolae in 
SMCs (Extended Data Figs. 6e-h, 7e-j), indicating that caveolae inSMCs 
have a negligible role—if any—in neurovascular coupling. 
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images) asshown ina. e, Normalized MFSD2A immunofluorescence in aECs 


of the NO pathway. Moreover, the caveolae-mediated pathway isat least 
Caveolae in aECs function independently ofeNOS as important as the NO pathway for neurovascular coupling. 
Next, we examined how aECs utilize caveolae to mediate neurovas- 
cular coupling. Caveolae have been implicated in many cellular pro- 
cesses”, including transcytosis”, servingasa membrane reservoir MFSD2A downregulates neurovascular coupling 
during mechanical stretch”, clusteringreceptorsand ion channels, _Wenextinvestigated why cECshave fewcaveolae whereasaECshaveabun- 
and mediating intracellular signalling”. We focused on NOsignalling _dantcaveolae. Itwas previously discovered that MFSD2A expression in CNS 
because NO isa major vasodilatory factor inneurovascular coupling” _ cECsactively suppressed caveolae formation and thatthis wasnecessary for 
and previous studies have reported thatCAV1 interacts physically with blood-brainbarrier integrity” ", Usingimmunohistochemistry, wefound 
eNOS (encoded by Nos3)"*. We firstexamined whether eNOS andNO__ that MFSD2A protein was undetectable in aECs in both brainand retina 
levelsare altered in the absence of caveolae. Unexpectedly, we found (Fig. 4a, d, Extended Data Fig. 9a-d). Consistent with this result, Mfsd2a 
similar levels of eNOS protein and NO inaECsinwild-typecontrol and _ transcriptlevelsarealsolowinaECs compared withcECs”. Thus, CNS cECs 
Cav mice, whereas both eNOS proteinand NOsignalwereabsentin robustly express Mfsd2ato suppress caveolae, whereasaECslackMFSD2A 
Nos3 mice (Extended DataFig. 8a-d). To examine geneticinteractions and are enriched in caveolae. We therefore examined whether ectopic 
between Cav1 and Nos3, we characterized neurovascular couplingin _ expression of Mfsd2a specifically in CNS aECsissufficient to suppress 
Cavt-Nos3" double-mutant mice. Wereasoned thatifCAV1andeNOS _caveolaeinthesecells;if'so, ourresults so far predict thatthissuppression 
areinthesame geneticpathway, the double mutantsshouldphenocopy _ ofcaveolaeinaECs would resultin an attenuated neurovascular coupling. 
one of the single knockout mice, whereas if they functionin separate To ectopically express Mfsd2a in aECs only, we generated a trans- 
parallel pathways, the double knockout should have an additive phe- genic mouse in which Mfsd2a expression is Cre-dependent (referred 
notype of the two single-knockout mice. Using our in vivo imaging —toas R26‘*"**") (Extended Data Fig. 10a,b) and crosseditto BMX", 
paradigm, Nos3 mice displayedattenuated arteriolar dilationandcap- After tamoxifen treatment, MFSD2A protein was expressed abundantly 
illary blood flow upon whisker stimulation (Fig. 3a-d), consistent with _ in aECs in brains from BMX™*";R26'%""@"" mice, whereas control 
aprevious report”. TheCavI"Nos3""doublemutantmice completely BMX“ ;R26‘%%4"* adult mice lacked MFSD2A expression in brain 
lostarteriolar dilation andred blood cell velocity enhancementupon _ arterioles (Fig. 4b, e). Moreover, electron microscopy analysis revealed 
whisker stimulation, compared witha partial reduction in CavI"and _ thatcaveolae density was reduced significantly in BMX" ;R26'S°™52" 
Nos3" single mutants despite having normal baseline diameter and _ mice relative to control (Fig. 4c, f). Of note, in vivo imaging revealed 
blood flow (Fig.3a-d, Extended Data Fig. 8e, f). These results demon- _anattenuation of arteriolar dilation upon whisker stimulationin mice 
strate that caveolae-mediated neurovascular couplingisindependent — with MFSD2A overexpression in aECs (BMX“""";R26""""“*") relative 
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to control mice (BMX-“"*" R264") (Fig. 4g-i, Extended Data 
Fig. 10e). These experiments demonstrate that ectopic overexpres- 
sion of MFSD2A in CNSaECsissufficient to reduce caveolae density and 
impair neurovascular coupling. Furthermore, inhibition of caveolae 
vesicles specifically in aECs using two different approaches (overex- 
pression of MFSD2A and genetic deletion of Cav) both resulted in 
attenuated neurovascular coupling, demonstrating the importance 
of caveolae in CNS aE Cs for mediating neurovascular coupling. 


Discussion 


Weusednaturalstimuliunder physiological conditions inawakemicewhile 
simultaneously measuring neural activity and vascular dynamics under 
two-photon microscopy tostudy mechanisms underlyingneurovascular 
coupling. Wediscovered thatcaveolae in CNSaE Cs havea key roleinmedi- 
ating neurovascular coupling. In addition, we confirmed that the previ 
ously reported eNOS pathwayalso hasarolein neurovascular coupling”. 
However, we found thatthe caveolae-mediated pathway isindependentof 
eNOSsignalling, as the perturbation ofboth caveolae-and eNOS-mediated 
pathways together completely abolished neurovascular coupling, whereas 
ablation of each pathway aloneresulted in partialimpairment. Thus, these 
findings indicate thatthe caveolae-mediated pathwayis atleastasimpor- 
tantas the NO pathway for neurovascular coupling. 

Previous studies highlighted the importance of ECs in neurovascular 
couplingin vivo. Locally disrupting ECs using optically induced reactive 
oxygenspecieshalted propagation of stimulus-evoked vasodilation inpial 
arteries*. Our present findings have extended this work to identify and 
demonstrate specificmolecularand subcellular componentsin aECsthat 
are essential for neurovascular coupling using cell-type-specific genetic 
manipulations. Given therecentevidence thatcECsareinvolvedinsensing 
neuralactivity changesandareimportantforneurovascular coupling”**, 
we proposethatafter sensing nearby increased neuralactivity, cECsrelay 
thissignalelectrically to theupstream ECs, which inturnsend vasodilatory 
cuesto SMCs viaa caveolae-dependent process. We considered how cave- 
olae could carry outthis function. Although caveolaehavebeenreportedto 
serveasamembrane reservoir during mechanical stretch”, theattenuated 
vasodilation observed in Cav miceis unlikely to result from impaired 
arteriolarelasticity, given thatsuperfusingNO donorinvivo—which dilates 
arteriolesby directly relaxingSMC~produced similar ditationsin wild-type 
and CavI" mice (Extended DataFig. 4k, 1, Supplementary Videos8,9). Simi- 
larly, previously described interactions between caveolae and the eNOS- 
signalling pathway cannotexplain theimpaired vasodilation observed in 
Cav“ mutantmice, because simultaneous ablation of both caveolae-and 
eNOS-mediated pathways abolishes neurovascular coupling, whereas 
ablation of either pathway aloneresultsinonly partialimpairment. Inlight 
of these results, the ability of caveolae to cluster ion channels and recep- 
tors?°—several of which have been implicated in vasodilation*probably 
explainsits rolein neurovascular coupling. InaECs, caveolae could cluster 
these channels to facilitate transmission of vasodilatory signals toSMCs. 
Identifyingthe channelsthatcluster in caveolaein ECs willbeanimportant 
nextstep for the field toaddress. 

Our resultsalso demonstratethat ECs from different vascular segments 
in the CNS exhibit heterogeneity at molecular, cellular and functional 
levels. Herewe show that this endothelial heterogeneity governsthetwo 
uniqueandimportant functions of the CNS vasculature: the blood-brain 
barrierand neurovascular coupling, We found that cECsexpress MFSD2A, 
which suppresses caveolaetoensure blood-brain-barrier integrity". By 
contrast, aECs lack MFSD2A and concurrently have abundant caveolae, 
whichareimportant for neurovascular coupling. We expect this kind of 
heterogeneity to exist broadly in CNS ECs and that understanding this 
heterogeneity will advance our understanding of the diverse functions of 
ECsinhealth and disease. Given thatneurovascular couplingisimpaired 
in various neurological disorders’, future studies examining whether 
these different molecular and cellular pathways are altered in disease 
may provide insight for development of novel therapies. 
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Methods 


Mice 
All mouse experiments were approved by the Harvard University 
Institutional Animal Care and Use Committee (IACUC). The following 
mice strains were used: wild type (C57BL/6), Jackson Laboratory, no. 
000664), Mfsd2a°"* (ref."), BMX (ref. °), yh" (ref.), NG2- 
DsRED (ref.”;JAX, no. 008241), Ail4 (ref.”,JAX, no. 007914), Ai39 (ref.; 
JAX, no, 014539), Ai75 (ref. *;JAX, no. 014539), ROSA26-‘*"""# (gener- 
ated during this study), Thy/-GCaMP6s* (JAX, no. 024275), Cav’ 
(JAX, no. 007083), Cavi-floxed, Nos3“* (JAX, no. 002684), ROSA26- 
PhiC31 (ref. ”;JAX, no. 007743). All mice were maintained on a mixed 
background and both males and females were used. For adult mice 
expressing creER, tamoxifen (Sigma-Aldrich, T5648) was dissolved in 
com oil ata concentration of 20 mg mI" and injected into peritoneal 
cavities with 0.2mg.g "body weight. Six- toseven-week-old mice were 
treated with tamoxifen for five consecutive days and were allowed to 
recover for one week following the last tamoxifen treatment before 
cranial surgery or dissections were performed. Randomization was 
determined by mouse genetics as wild types, mutants and transgenic 
mice were assigned randomly into their respective genotype group. 
‘Sample sizes were determined by a power calculation on the basis of 
previous pilot data and representative sample sizes from previous 
literature that had similar experiments. In experiments involving 
mutant and transgenic mice, the genotypes were blinded until after 
data acquisition and analysis. 


Generation of ROSA26-LSL-Mfsd2a transgenic mice 

The targeting vector contains a CAG promoter and loxP-3xSV40PA-loxP 
followed by mouse Mfsd2acDNA and WPRE-PolyA. Apositive selection 
cassette, attB-PGKNeoR-att?, is located between the insertion and the 
3’ homologous arm which is 4.3 kb. The length of the 5’ homologous 
armis 1.1 kb (Extended Data Fig. 10). 

The targeting vector was electroporated into embryonic stem 
(ES) cells derived from F, hybrid blastocyst of 12986 x C57BL/6}. The 
G418-resistant ES clones were screened by nested PCR using primers 
outside the construct paired with primers inside the insertion cas- 
sette. The positive ES cell clones were used to generate chimeric mice 
by aggregating with 8-cell embryos of CD-1 strain. The attB-Neo-attP 
cassette was removedin mice by crossing the chimaeras with R26PhiC31 
females (JAX, no. 007743) backcrossed in CS7B1/6) for13 generations. 
The F, pups were genotyped by PCR using primers set (5/-CCAAA- 
GICGCTCTGAGTTGT-3’); (5’-CCAGGTTAGCCTTTAAGCCT-3’) and 
(5'-CGGGCCATTTACCGTAAGTT-3’). The PCR products are 250 bp for 
the wild-type allele and 329 bp for the mutantallele. 


Long-term cranial window surgery 

Six-week- to four-month-old mice underwent a craniotomy, implanta- 
tion of asterile glass window (3.0 mm) and attachment of a custom- 
ized titanium head plate to the skull using dental cement (Metabond 
Parkell) Prior tothe craniotomy, an intramuscular dose of dexametha- 
sone (120 mg kg") was administered. Mice were anaesthetized with 
3-5% isoflurane and maintained at 1-2% isoflurane for the duration 
of the craniotomy. The respiration rate and body temperature were 
continuously monitored throughout the procedure to ensure the 
appropriate level of anaesthesia. A subcutaneous dose of the anal- 
gesic: buprenorphine (0.1 mg kg”) and ketoprofen (5.0 mg kg") was 
administered at the onset of the procedureand was also administered 
daily for two additional days after the craniotomy. A single dose of the 
local anaesthetic lidocaine (20 mgkg”)/bupivacaine (2.5 mg kg") was 
administered subcutaneously atthe site of the craniotomy. Thecentre 
of the craniotomy over the barrel cortex and retrosplenial cortex was 
determined for each mouse in relation to the skull indentations bregma 
and lambda. Generally, for barrel cortex surgery, 3.5-3.8mm posterior 
and 1.5 mm laterally from the midpoint between bregma and lambda 


along the sagittal suture was marked as the centre of the craniotomy. 
For retrosplenial cortex surgery, the centre of the craniotomy was 
3.2-3.5 mm posterior and 1.0 mm lateral from bregma. Following the 
craniotomy and the window with head plate implantation, mice were 
treated with buprenorphine/ketoprofen and observed for signs of pain 
and/orinfection for 72h. Furthermore, mice were handled by the experi- 
menter, habituated to head restraint and trained to run on afoam ball 
daily for three consecutive days. Sensory-evokedarteriolar dilation and 
capillary blood flow were imaged through a3.0-mm-diameter cranial 
window positioned over the somatosensory cortex inhead-restrained 
mice with the freedom to walk on a bidirectional styrofoam ball. 


Two-photon microscopy 

‘Two-photonimaging was performed usinga custom-built microscope 
equipped with a tunable Ti:sapphire laser (MaiTai HP DS, Spectra-Phys- 
ics) controlled by Scanimage 5.1 (Vidrio Technologies). The intensity of 
the femtosecond pulsed infrared beam was controlled by an electro 
optical modulator (Conoptics) and passed througha pair of scan mir- 
rors (Cambridge Technology) that enabled image acquisition at30 Hz 
fora field of view of 1.0 mm?and S12 512 pixels. Control ofimage zoom 
was enabled by controlling the resonant scanner amplitude. The objec- 
tive lens used was a 16x, 0.8 NA, water-immersion lens (Nikon). Green 
and red fluorescence photons were separated using custom-sized 
dichroic beamsplitter (580 BrightLine, Semrock) and two custom- 
sized single-band bandpass filters (525/50 nm BrightLine, 641/75 nm 
BrightLine, Semrock). Fluorescence photons were collected using 
photomultiplier tubes (Hamamatsu). 


Invivo imaging of pial arteriolar dilation and analysis 

Arterioles labelled with Alexa Fluor Hydrazide 633 wereimaged at800 
nm witha field of view size of 200 um x 200 im (512 x 512 pixels, pixel 
size of 0.16 um? per pixel) at 30 Hz. Whisker stimulation (4 Hz, 5s) was 
performed usinga foam brush controlled by aservo motor under the 
control of WaveSurfer. Alexa Fluor 633 Hydrazide (5 mgkg”) wasintra- 
venously injected into mice to visualize arterioles in vivo’. Weimaged 
surface level pial arteriesand arterioles from the middle cerebralartery. 
Ourselection of arteries and arterioles were guided by hydrazide’ ves- 
sels, which labels arteriesand arterioles only*”. Because we are stimulat- 
ing the entire whisker pad (Extended Fig. 1) as opposed to stimulating 
individual whiskers, we observed changes in all sampled branches. 
This is consistent with previous findings” that branch orders were 
notnecessarily relevant as longas the vessels were arterial (which they 
defined as SMA’ or hydrazide’). Three technical trials were acquired 
and averaged for each field of view. Ten to thirteen fields of view were 
acquired per imaging session. Three imaging sessions were collected 
on three separate days per mouse and arteriolar dilation responses 
were averaged acrossall three sessions for each mouse. To determine 
per cent changein diameter relative to baseline, the time series were 
first filtered with a Gaussian blur and background subtracted with a 
rolling ball of 50 pixels. Five-linescans orthogonalto the arterioles were 
sampledto generate kymographs. The two maximum intensity peaks 
(which represent thewalls of the arterioles) were identified across the 
kymograph. The change in diameter of the arterioles was determined 
as (diameterine ~ diameter scine)/diameteraseinee Diametetyasine WAS 
determined as the mean diameter during the 3 s before the whisker 
stimulation. Diameter jn is the vessel diameter at a particular time. 
The change in maximum diameter was determined as the maximum 
value during the whisker stimulation. To determine latency onset to 
dilate, a line was fitted through 80% and 20% of the maximum value. 
Thelatency onsetto dilate was considered the time difference between 
thex-intercept of the line and the start of the whisker stimulation, 


Invivo imaging of parenchymal arteriolar dilation and analysis 
Arterioles stained with Alexa Fluor 633 Hydrazide (5 mg kg”) were 
imaged at multiple depths within the barrel cortex. Three parenchymal 
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depths were used at 100 ym, 200 um and 300 um from the pial surface. 
Three technical trials were acquired and averaged for each field of view 
ateach depth location. Three-to-four divingarterioles were imaged per 
imaging session and three sessions on sequential days were recorded 
intotal per mouse. For analysis of parenchymal arterial dilation which 
appear asellipses, we tracked the diving arterioles by fitting an ellipse 
tothe hydrazide signal. Movies were firstaveraged using athree-frame 
rolling average then smoothed with a Laplacian of Gaussian filter. 
Intensities of the smoothed movie were rescaled to range from 0 to 
255 and athreshold pixel intensity was picked to exclude background 
fluorescence signal outside of the ring of hydrazide signal surround- 
ing the arteriole. Next, ridge detection was performed using the Fl 
ridge detection plugin®. Three separate ridge-detection parameter 
setswereused and the ridges were combined to detect bright and dim 
ridges. Next, an initial estimate of the ellipse fit was determined using 
an elliptical Hough transform on the binarized ridge image. The fit 
was then refined by minimizing the distance between the ridge pixels 
and thefitted ellipse using the Hough transform results as the starting 
parameter values. The ellipse was parameterized as follows: 


x(a) =a cos(a)cos(4) ~ b sin(a)sin(@) +x, 
ya) =a cos(a)sin() ~ b sin(a)cos(4) +, 


inwhicha(a), y(a)arethexandy coordinates of the points on theellipse, 
aand bare the two ellipse axes, is the tilt angle of the ellipse, x, and 
yare the coordinates of the ellipse centre, and a ranges from 0 to 2 
tocircumscribe the entire ellipse perimeter. Minimization was done 
inMATLABusing the Isqnonlin function and 95% confidence intervals 
of the parameters were determined using nlparci. To be consistent 
with the pial artery dilation tracking, we report changes in diameter 
of the parenchymal arterioles (AD/Dyassne) from the minor axis of the 
ellipse, which does not depend onthe orientation of the arteriole cross 
section relative to the microscope optical axis. Asimage quality is vari- 
able, frames with poor accuracy fittingare discarded by first rejecting 
framesin which the minor axis fitis more than two median absolute 
deviations from the ten-frame sliding window median. Next, frames 
that have a confidence interval greater than four pixels for the minor 
axis are discarded. If fewer than 50% of the frames of the trajectory 
remain, the entire trajectory is discarded. Finally, the trajectory was 
smoothed usingthe MATLAB smooth function with rlowess. To obtain 
AD/Dyesaine, this value was multiplied by two and divided by the mean 
diameter immediately 3s before whisker stimulation. The three techni- 
calreplicates were averaged as with the pial artery dilation experiments 
to produce the trajectory for the vessel. 


Invivo imaging of capillary red blood cell velocity and analysis 

Mice were intravenously injected with quantum dots 525 (Thermo 
Fisher Scientific) and Alexa Fluor Hydrazide 633 to unambiguously 
distinguish arterioles and capillariesin vivo. Hydrazide-negative capil- 
laries were imaged witha field of view size of 100 um x 25 um (512 25, 
pixels, pixel size of 0.04 um’ per pixel) at 610 Hz. Whisker stimulation 
(4Hz, 5s) was performed using a foam brush controlled by aservo 
motor under the control of Wavesurfer. Three technical trials were 
acquired and averaged for each field of view. Three imaging sessions 
were collected on three separate days per mouse and changes in cap- 
illary red blood cell velocity were averaged across all three sessions 
for each mouse. To determine per cent change in velocity relative to 
baseline, the movies were first filtered with a Gaussian blur and back- 
ground subtracted witha rolling ball of 50 pixels. Five-line scans parallel 
tothe flow of red blood cells were sampled to generate kymographs. 
Using a published algorithm” that uses an iterative radon transform 
and edge detection filter, the change in velocity of red blood cells was, 
determined as (velocity jme~ VelOCItY .cine)/VEIOCItY apne: Velocity base- 
linewas definedas the mean velocity during the 3s before the whisker 


stimulation. Velocity jc is the velocity of thered blood cell flow at that 
momentintime Thechangein maximum velocity was determined as the 
maximum value during the whisker stimulation. To determinelatency 
onset to increase red blood cell velocity, a line was fitted through 80% 
and 20% ofthe maximum velocity value. The latency onset was consid- 
ered the time difference between thexintercept of the fitted line and 
the start time of the whisker stimulation. 


Invivo two-photon imaging and pharmacology 

To assess vasodilatory function of pial artery SMCs, we imaged pial 
artery diameter changes with two-photon microscopy in response 
to topical application of the nitric oxide donor DEA NONOate (EMD 
Millipore). Arteries were stained with Alexa Fluor 633 Hydrazide, as 
described earlier, one day beforethe imaging session. Under isoflurane 
anaesthesia (1.0-1.5%), a titanium head-plate with a10-mm-diameter 
hole centred over the right parietal skull bone was cemented onto the 
skull using Metabond. A custom perfusion system was constructed 
along the rim of the head-plate hole to allow forsimultaneousimaging 
and application of DEA NONOate. A 5.0-7.0-mm circular craniotomy 
of the right parietal skull bone was carefully performed and thenewly 
exposed cortex was kept submerged in artificial cerebral spinal fluid 
(aCSF) for the duration of the experiment. The mouse was then carefully 
transitioned fromisoflurane (0.5%) to ketamine/xylazine (100 mg kg) 
and transferred from the surgical stereotaxic stage to the imaging plat- 
formwhere body temperature was maintained at 37 °C using aheat pad. 
A fresh 1.0 1MDEANONOate solution was prepared immediately before 
thestart of theimaging session given theshort half-life of DEA NONOate 
in aqueous solution. Asyringe pump (Harvard Apparatus) controlled 
the application of the DEA NONOate solution onto the exposed cortex 
at 1.0 ml min“. Five recordings (30 sin duration) of each pial vessel per 
mouse where collected in series interleaved with 30 s of washing with 
aCSF. The five measurements were then averaged per mouse. 


Ex vivo, acute slice two-photon slice imaging and pharmacology 
Acute coronal brain slices were prepared by deeply anaesthetizing 
mice with isoflurane inhalation followed cardiac perfusion with ice- 
cold choline-based cut solution containing (in mM): 25 NaHCO, 25 
glucose, 1.25 NaH,PO,, 7 MgCl,, 2.5 KCI, 0.5 CaCl,, 11.6 ascorbic acid, 
3.1 pyruvic acid and 110 choline chloride. After brain dissection and 
blocking, 300-pm slices were prepared in cut solution with a Leica 
VT1000 s vibratome. Slices were then transferred for30 min to recovery 
into a holding chamber containing 34°C (aCSF) containing (in mM): 
125 NaCl, 2.5 KCI, 1.25 NaH,PO4, 25 NaHCO,, Il glucose, 2 CaCl, and 
1MgCl,. During recovery, slices were incubated with approximately 
1M Alexa Fluor 633 Hydrazide (ThermoFisher). Following recovery, 
slices were imaged while constantly perfused with room temperature 
aCSF. Choline cut solution and aCSF were constantly bubbled with 
5% CO,/95% O,. Imaging was performed ona custom-built two-photon 
microscope and images acquired with a custom version of Scanmage 
written in MATLAB (Mathworks). During imaging, arteries were con- 
stricted with 100 nM U46619 (Sigma-Aldrich) and dilated by acutely 
dissolving about 5 mg of DEA NONOate (EMD Millipore) intothe10 ml 
of recycling aCSF being perfused over theslice. 


Non-invasive blood pressure measurement procedurein awake 
e 

Systolic, diastolic and mean blood pressure were measured using the 
non-invasive tail-cuff method (CODA Monitor, KentScientific). Atthe 
start of a blood pressure measurement session, the appropriate mouse 
holder was selected on the basis of the mouse’s weight. The holder was 
placed over a heating pad with a set temperature that is regulated at 
38°C. The tail-cuff and volume pressure recordingsensor were placed 
on the heating pad and covered with a blanket to allow these compo- 
nents to reach the set temperature. After 2-3 min, the awake mouse 
was gently introduced into theholder. A light blanket was draped over 


the tail and the mouse was leftalone for 3-5 min to allow habituation. 
Theblood pressure measurements take place via 10-20 tail-cuffinfla- 
tion-deflation sweeps that in total take from 5-10 min in duration. 
Multiple days of measurements may be required to gain confidence 
in the accuracy of the measurements. After the measurements, the 
mouse was carefully removed from the holder andimmediately placed 
inits cage. Measurements of all sweeps are then averaged per mouse. 


‘SMC coverage quantification 

NG2**? (or CSPG4*°) mice were crossed to Cav mice. Cranial win- 
dow surgerieswere performed over the barrel cortex of Cavi"";NG2™** 
and Cavt“-;NG2°**®" mice. Mice were injected with Alexa Fluor 488 
Hydrazide (5 mg kg”). Around 30 arterioles (hydrazide'DsRed’) per 
mouse were imaged. A 100-um intensity line profile was drawn per- 
pendicularly to the contractile bands of the SMCs. Maximum peaks 
corresponding to individual smooth muscle were counted and the 
number of SMCs per 100 pm length was determined. 


Transmission electron microscopy 

Brains from adult mice were dissected and fixed by immersion in 5% 
glutaraldehyde, 4% PFA and 0.1M sodium cacodylate for 2 weeks at 
room temperature. Following fixation, brains were washed overnight 
in 0.1 Msodium cacodylate. Coronal vibratome free-floatingsections 
of 30 um were collected. The cortices, particularly somatosensory and 
motor, were microdissected, post-fixed in 1% osmium tetroxideand1.5% 
potassium ferrocyanide, dehydrated, and embedded in epoxy resin. 
Ultrathin sections of 80 nm were then cut from the block surface, col- 
lectedoncoppergrids, and counter-stained with Reynold’sleadcitrate 
and examined under a 1200EX electron microscope (JEOL) equipped 
with a2k CCD digital camera (AMT). 


Mean vesicular density 

Forall transmission electron microscopy quantifications, mean vesicu- 
lar density values were calculated from the number of vesicles per 
um? of cell area for each image collected. All images were collected 
at 12,000x magnification and analysis was performed blinded. Each 
density value (circle on the graphs) represents an individual vessel 
(capillary or arteriole). Thesame colour of the circle represents vessels 
analysed from the same mouse. Values are expressed as mean +s.e.m. 


Immunohistochemistry 

Mice were anaesthetized with ketamine/xylazine via intraperitoneal 
injection and then mice were transcardially perfused with cold PBS 
and followed by cold 4% PFA. Brains and retinas were fixed by immer- 
sion in 4% PFA/PBS overnight at 4 °C. Next, brains and retinas were 
washed 3xin PBS, Brain sections were either cut as 50-m sections on 
the vibratome or cryopreserved in 30% sucrose, frozen in TissueTek 
OCT (Sakura) and cutas 25-tm sections on the cryostat. For MESD2A, 
and eNOS immunohistochemistry, mice were euthanized by cervical 
dislocation. Brains were snap-frozen with liquid nitrogen and cut as 
25-yum sections on the cryostat. Brain sections were fixed with chilled 
methanol for 10 min. Brain sections and retinas were blocked with 10% 
goat or donkey serum, 5% BSA, PBST (0.5% Triton X-100) and stained 
overnightat 4 °Cwith the following primary antibodies at the indicated 
concentrations: MFSD2A (1:200, Cell Signaling Technologies; RRID: 
AB_2617168) or MFSD2< (a gift from D. Silver, as used previously"), 
‘SMA (1:1,000, Sigma-Aldrich, no. C6198, RRID :AB_476856), ICAM2 
(1:200, BD Biosciences, no.553326, RRID:AB 394784), PECAM (1:200, 
BD Biosciences no.553370; RRID: AB 394816), claudin S (ThermoFisher 
Scientific, no. 34-1600, RRID:AB 2533157), eNOS (Abcam, no. ab558° 
RRID:AB_ 304967), MYHII (Abcam, no. abS3219; RRID:AB_2147146) 
or desmin (ThermoFisher, no. PAI-37556) and TAGLN (Abcam, no. 
ab14106), followed by corresponding Alexa Fluor-conjugated sec- 
ondary antibodies (1:500, ThermoFisher) and Alexa Fluor Hydrazide 


633 (1:1,000, ThermoFisher). Tissues were mounted with ProLong 
Gold for imaging. 


Insitu NO detection 

This assay was previously adapted from these studies". Mice were 
deeply anaesthetized with ketamine/xylazine. Mice were transcardially 
perfused with warm (37 °C) 50 m1PBS, then perfused with 50 mlwarm 
PBS containing 10 »MDAF-2 (Thermo Fisher Scientific, D23842), 100 1M 
Larginine and 2mM CaCl,. Next, mice were perfused with warm PBS 
again followed by 4%PFA. Brains and kidneys were collected and fixed 
overnight in 4% PFA at 4°C. Tissues were sectioned on the vibratome 
(50 um) and processed for immunostaining. 


Light microscopy 

Olympus FluoViewFV1200 and LeicaSP8 laser scanning confocal micro- 
scopes (20x, 0.75 NA, 40x, 1.3 NA, 63x 14 NA) and an Olympus VS 120 
slide scanner (10x, 0.4 NA) were used for imaging retina flat-mounts 
and brain sections. Images were processed using Adobe Photoshop, 
Ilustrator, Olympus Fluoview and Fiji (NIH). 


Statistical analyses 

Allstatistical analyses were performed using Prism 7and 8 (GraphPad 
Software). Two group comparisons were analysed using an unpaired 
two-tailed Student's -test or non-parametric analyses. Multiple group 
comparisons were analysed using a one-way ANOVA, followed by a 
post hoc Bonferroni analysis to correct for multiple comparisons. No 
data were excluded when performing statistical analysis. Thes.e.m. 
was calculated for all experiments and displayed as errors bars in 
graphs. Statistical details for specific experiments-including exact 
nvaluesand what n represents, precision measures, statistical tests 
used and definitions of significance-can be found in figure legends. 
Each colour circle on the graphs throughout the study represents an 
individual vessel (capillary or arteriole or SMC). The same colour of 
the circle represents vessels analysed from the same mouse. Values 
are expressed as mean +5.e.m. Please see Supplementary Table 1 for 
statistical test results. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 
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plotted in Figs. 1-4 and Extended Data Fig. 1-10 are available with the 
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https://github.com/gulabneuro/Pial-Vasodilation-Analysis. The source 
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https://github.com/gulabneuro/divingArterioleTracking. 
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Extended Data Fig. 1 |In vivo two-photon imaging of neurovascular coupling 
inthebarrel cortex andretrosplenial cortex.a, Setup of the in vivo 
microscopy. Awake mice with cranial windows over the barrel cortex are head- 
fixedand allowed to move ona foam ball. Whisker stimulator arrow) isused for 
brushing whiskers to evoke neuralactivity in the barrel cortex. b-g, Imagingin 
the barrel cortex. b, Hydrazide injection n Thyi-GCaMP6s mice enables 
simultaneousimaging of neural activity (green) and arteriolar dilation 
(magenta). Two-photon imaging of arterioles and neural activity before (left) 
andafter (right) whisker stimulation, Hashes indicate the baseline diameter at 
time=0s.¢, Timecourse of change inarteriolar dilation (magenta) and 
GCaMP6s fluorescence (green). Orange bar signifies the period of whisker 
stimulation. d, Two:photon imaging of arterioles (magenta) and capillary 
blood flow (blue). After intravenousinjection of quantum dots, the plasma is 
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brightwhereas thered blood cells are dark. e, High magnification of capillary 
outlined by the red boxin d. Minimizing the image size increases the temporal 
resolution to about 610 Hzor.6ms per frame. f, Kymographsof capillary 
blood flow during baseline (left) and whisker stimulation (right). Kymographs 
were generated from the parallel ine scan (red line) of the capillary blood flow 
ine.g, Timecourse of change inred bloodcell velocity. h, Time course of 
change in arteriolar dilation inthe barrel cortex (black, n= 78 arterioles, 3mice) 
andin the retrosplenial cortex (red, n=54 arterioles, 3mice).i, Maximum 
percentage changein arteriolar dilation upon whiskerstimulus in these two 
brain regions upon whisker stimulus. The orange bar signifies the period of 
whisker stimulation. Dataare mean +s.e.m.;nested unpaired, two-tailed ctest 
fori. 
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Extended Data Fig.2| Cavi-knockout mice haveimpaired vasodilationin 
bothpialarteriesand penetratingarterioles diving deepintothe 
parenchyma.a, Three-dimensional volume rendering of atwo:photor-imaged 
site in Cav” mouse barrel cortex, from the pial surface toa depth ofabout 
400 um. The lumen of all vesselsis filled with quantum dots (blue) and 
arteriolesare labelled with hydrazide (magenta). The deepestimaged bin isat 
300 uimbecause we see the appearance of the hydrazide start at 300 jim, 
indicating that thisisat the start of thearteriolar vessels. This observationis 
also consistent witha previousstudy*, which characterized hydrazideasan 
arteriolar vessel marker. Grey slices correspond to zcross-sections shown per 
depth. Independent replicates forawere performed in five wild-type mice. 
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ime course of change in arteriolar dilationin the barrel cortex from 
Cavr“(n=5 mice, 10-1Sarterioles per depth) (b) and Cav mice(n=Smice, 
10-1Sarterioles per depth) (c).d, Maximum percentage change inarteriolar 
dilation upon whisker stimulation between in Cav’ and Cavt* miceat the 
indicated depth. Statistical significance was determined by two-way ANOVA 
witha post hoc Bonferroni multiple comparison adjustment ford. All dataare 
‘mean s.e,m. We compared the maximum percentage change in arteriolar 
dilation upon whisker stimulation between Cav" and Cav miceateach 
depth and also compared the responses across depthwithin the same 
genotype. 
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Extended DataFig.3|See next page for caption. 
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Extended DataFig.3| Cavi- knockout mice have attenuated vasodilationbut 
normal neuralactivity and neurovascular coupling kinetics. a-d, Maximum 
percentage changein dilation response (a) and baseline diameter (b) latency to 
‘maximum change in arteriolar dilation (c),time to peak dilation (d) in Cav" 
(n=193arterioles, 40 capillaries, S mice), Cavt'” (n=123arterioles, 40 
capillaries, Smice), and Cav mice (n=153 arterioles, 31 capillaries, S mice). 
¢.f, Latency tomaximumred blood cell flow velocity (e) and time to peak red 
blood cell flow (f) in Cavt’* (n=193 arterioles, 40 capillaries, Smice), Cavl~ 
(n=123arterioles, 40 capillaries, Smice) and Cav mice (n=153 arterioles, 

31 capillaries, Smice).g, h, Maximum percentage change in GCaMP6s (g) and 


latency to peak change in GCaMPs (h) in Cav"*;Thy1-GCaMP6s (n=78ield of 
views of theneuropils, S mice) and Cav” :Thyl-GCaMP6s (n=78 neuropils, 
Smice). Each circle represents an individual trial of GCaMP6s signal.i, Baseline 
diameter to absolute maximum diameter response during whisker stimulation 
in Cavt"* and Cavt mice. Tail-cuffblood pressuremeasurementsbetween 
avr (n=5 mice) and CavT* mice (n=5 mice). Statistical significance was 
determined bya one-way nested ANOVA with a post hoc Bonferroni multiple 
comparison adjustment fora-f, anestedunpaired, two-tailed -test forg,h,and 
two-tailed Mann-Whitney Utest for). All data are mean +s.e.m. 
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Extended DataFig. 4| See nextpage for caption. 
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Extended Data Fig. 4 | Cavi-mutant mice exhibit normal SMCintegrity and 
function. a, Invivo two-photon microscopy images of hydrazide (magenta) 
and DsRed (red) from Cavt"*NG2"*?* and Cav NG2"*®* mice. 

, Quantification of DsRED” SMCs per100 mas shownina, in Cavt" (n=3 
mice, 27 arterioles) and Cavt” (n=3mice, 28 arterioles) mice. 

¢, Immunostaining for SMC contractile proteins, including SMA, MYHIL, 
TAGLNand desmin on brainarterioles from Cav" and Cav mice. 

d-g, Normalized fluorescence quantification ofthe various contractile 
proteins from Cavt"" and Cavt mice. h, Still frame images of arterioles 
labelled with hydrazide magenta) inex vivo acute brain slices from Cav" and 
CavI* mice usingtwo-photon microscopy. Left, arterioles during baseline; 
middle, arterioles during U4 6619 (thromboxane agonist) treatment; right, 


arterioles during DEA NONOate (NO donor treatment. White hashes outline 
the arterioles during baseline based on time = Omin.i,j, Maximumarteriolar 
contraction by U46619 (i) and maximum arteriolar dilation by DEANONOate (j) 
onacute brain slices from CavI"* (n=Smice, 19 arterioles) and Cavt(n=5 
mice, 22 arterioles).k, In vivoimages of arterioleslabelled with hydrazide 
(magenta) from Cavr and CavT* mice usingtwo-photon microscopy. Left, 
arterioles during baseline; right, arterioles during DEANONOate superfusion. 
White hashes outline the arterioles during baseline based on time =0s. 

1, Quantification of maximumarteriolar dilation during DEA NONOate 
superfusionin vivo (n=5 micefor both genotypes). Statistical significance was 
determined by nested, unpaired, two-tailed ¢test forb,d-g, i,j, andby two- 
tailed Mann-Whitney Utest orl, Datashown as mean +s.e.m. 
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Extended Data Fig. 5| Caveolae in CNS aECsare abolished inaEC conditional 
Cav1-knockout mice.a, Immunostaining of adult brain sections for ECs 
(ICAM2, green), SMCs (SMA, magenta) and CAVI (red) from control 
(BMx""";Caut"”) and aEC-specific conditional CAVI mutant (BMX""";Cavt™ 
mice. Arrows point to aECs. b, Transmission electron microscopy images of 
CNSaECsand SMCs from controlandaEC-specific conditional Cavi-mutant 
mice. Arrowheads point to caveolae. L, Lumen. ¢, Quantification of mean 
normalized immunofluorescence of CAVI ina Cs from control (n=S mice) and 
aEC specific conditional Cavi-mutant mice(n=Smice).d, Quantification of the 


mean vesicular density in aECs and SMCs between control (n=4 mice, 20 
arterioles) and aEC-specific conditional Cav-mutant mice (n=5 mice, 22 
arterioles).e, f, Quantification of baseline diameter(e) and baseline velocity in 
control (BMX"""";Cavt'®’;n=7 mice, 260 arterioles, 122 capillaries) and aEC 
conditional Cavt-knockout mice (BMX ;Cavr™:n=Smice, 193arterioles, 94 
capillaries). Statistical significance was determined by Mann-Whitney test for 
(©)andnested, unpaired, two-tailed ¢test for (d-f). Dataare shown as 
mean:s.e.m. 
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Extended Data Fig. 6| See nextpage for caption. 


Extended Data Fig. 6 | Conditional aEC-specificand SMC Cavi-knockout 
mice havenormal neurovascular couplingkinetics. a, Baseline diameterto 
absolute maximum diameter response during whisker stimulation in control 
(Bmx Cav”) and mutant (BMX""*+;Cavt) mice. b, Quantification of time 
topeak arteriolar dilation in control (BMX“** ;Cavr'";n=7 mice, 234 
arterioles) and aEC-specific conditional Cav-mutant (BMX™*+;Caul 
mice; 202 arterioles) mice, Quantification of latency to peakred blood cell 
flow velocity in control (BMX"";Caut'";n=7 mice; S8 capillaries) and aEC- 
specific conditional CavI-mutant (BMX“"*"+;CavT™; n=5 mice; 25 capillaries) 
mice. d, Quantification of time to peak red blood cell flow velocity in control 
(BMX ;Cavt’”";n=7 mice;127 capillaries) and aEC specific conditional Cavi- 
mutant (BMX°"*+;Cavr";n=5 mice; 94 capillaries) mice.e, Baseline diameter 


toabsolute maximum diameter response during whisker stimulation in control 
(Myhit"* ;Cavr”) and mutant (Myhi1""";Caut™ mice.f, Quantification of 
time to peakarteriolar dilation in control (Myhti"**;Cavl’”;n=S mice, 193 
arterioles) and SMC conditional Cavi-mutant (Myhir"":Cavt;n=Smice; 
180 arterioles) mice. g, Quantification of latency to red blood cell flowin 
control (Myhi1*;Cavt'"; n= 5 mice;36 capillaries) and SMC conditional Cavi- 
mutant (MyhiI™"";Caut;n=5 mice;26 capillaries) mice. h, Quantification 
time to peakredblood cellflow velocity in (MyhI"*;Cavt: n=5 mice;75 
capillaries) and SMC conditional Cavi-mutant (MyhiI°"™;Cavl™;n=5 mice; 75, 
capillaries) mice. Statistical significance was determined by anested unpaired, 
two-tailed etest for b-d,f-h). 
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Extended DataFig.7| See next page for caption, 


Extended Data Fig. 7| Conditional SMC-specific Cavi-knockout micehave 
normal neurovascular coupling. a, ImmunostainingonbrainsectionsforECs 
(ICAM2, green), SMCs (SMA, magenta) and CAVI (red) from control and SMC 
conditional Cavi-mutant mice. b, Transmission electron microscopy images 

of CNSaECsand SMCs from controland SMC conditional CavI-mutant 

mice. Arrowheads point to caveolae. L, Lumen.¢, Mean normalized 
immunofluorescence of CAVI in SMCs from control (n=S mice) andSMC- 
specific conditional Cavl-mutant mice (n=5 mice). d, Quantification of the 
‘mean vesicular density in aECs and SMCs in control (n=5 mice, 23 arterioles) 
and SMC conditional Cav mice (n=5 mice, 22 arterioles).e-g, Timecourse of 


change in arteriolar dilation (e), maximum percentage change in arteriolar 
dilation (f)and baseline diameter (g) in control (n=7 mice, 193 arterioles) and 
SMC conditional Cavi-mutant mice (n=5 mice, 176 arterioles). h-j, Time course 
of change in red blood cell velocity (fh), maximum percentage change in red 
blood cell velocity (i) and baseline velocity (j) in control (n=7 mice, 75 
capillaries) and SMC conditional Cavi-mutant mice (n=5 mice, 64 capillaries). 
Statistical significance was determined by unpaired, two-tailed Mann-Whitney 
Utest foreandanested, unpaired, two-tailed test for (d,f.g.i,J).Dataare 
mean:s.e.m. 
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Extended DataFig.8|See nextpage for caption. 
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Extended Data Fig. 8 |Cavi-mutant mice have normal levelsof eNOS protein 
and NO in CNS aECs and Cav and Nos3double knockout mice have normal 
baseline diameter and red blood cell flow. a, Immunostaining on adult brain 
sections for ECs (PECAML, green), arterioles (SMA, magenta) and eNOS (cyan) 
from Cavt"*Nos3"*, Cav Nos3"" and Cavi""Nos3*" mice. Independent 
replications were performed onthreemice per genotype. b, Immunostaining 
for (PECAML, green) and arterioles (SMA, magenta) on brain sections from 
Cavt'*Nos3"*, Cav Nos3** and Cavt'"Nos3* miceafter in vivo perfusion of 
NO-sensitive dye; DAF-2, yellow. Independent replicates were performed on 
four mice per genotype.c, Quantification of eNOS immunofluorescence 
intensity as shown inainaECs from CavI""Nos3** (n=3 mice, 35images), 


CavtNos3** (n=3 mice, 35images) and CavINos3 (n=3 mice, 37 images). 
4, Quantification of DAF-2intensity in aECsas shown in (b) from Cavt*Nos3* 
(n=4 mice, 73 images), Cav Nos3”" (n=4 mice, 71images),and Cavt*Nos3* 
(n=4 mice, 64 images).e,f, Quantification of baseline diameter (e)and baseline 
velocity (f) in Cavr'*Nos3** (n=5 mice, 148 arterioles, 76 capillaries), 
CavI*-Nos3** (n=5 mice, 128 arterioles, 68 capillaries), Cavl*Nos3” (n=S 
mice, 137 arterioles, 73 capillaries) and Cav Nos3* mice(n=Smice, 139 
arterioles, 74 capillaries). Statistical significance was determined by nested, 
unpaired, two-tailed ctest for ¢,d, and nested, one-way ANOVA witha posthoc 
Bonferroni multiple-comparison adjustment fore, f, Dataare mean:s.e.m. 
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Extended Data Fig. 9|MFSD2A snot detectedinCNSarteriolesinbrainand nascent, distal vessel (arrows) in PS retina inc.e, , Tamoxifen-treated, adult 
retina.a, b, Immunostaining on postnatal day (P)S (a) and adult (b) brain knockin Mfsd2a°"*;Ai14°" reporter mice demonstrates that tdTomato is, 
sections for ECs (PECAML, green), SMCs (SMA, magenta) and MFSD2A (white) _absentin SMA‘ arterioles but presentin SMA‘ capillaries in brain (e) and retina 
fromwild-typemice, Bluehashesoutline SMA‘arterioles.¢,d, Immunostaining _(f). Blue hashes and A indicate SMA’ arterioles. Independent replicates for a-f 
‘onPS (c)and adult (d) retina for ECs (claudin,green),SMCs(SMA, magenta) _ were performed onfive wild-type mice 

and MFSD2A (white) from wild-type mice. A, arterioles. MFSD2A isabsentin 
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Extended DataFig. 10 | Generation ofa Cre-dependent MFSD2A- 
overexpression transgenic mouse (R26). a, Construct for Cre- 
dependent MFSD2A overexpression knocked-into the ROSA26 locus. Mating 
with ROSA26: ®C31 recombinase mice removes the neomycin selection 
cassette, Subsequent mating with BMX*"and tamoxifeninjection enables 
ectopic overexpression of Mfsd2ain aECs. b, PCR genotyping of Cre-dependent 
MFSD2A-overexpression mice. c, Quantification of latency tochangesin 
arteriolar dilation in control (BMX! R26" ‘Smice, 149 arterioles) 
and aEC-specific MFSD2A overexpression (BMX"™";R26"""";n=5 mice; 
138 arterioles) mice. d, Quantification of time to peak arteriolar diationin 
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control (BMX ;R26!"*"""; n= 5 mice, 149 arterioles) and aEC-specific 
conditional Cavi-mutant (BMX*"';R26!*"*2""; n= mice, 138 arterioles) 
mice.e, Baseline diameter to absolute maximum diameter response during 
whisker stimulationin control (Mx R266 n= nice, 149 arterioles) 
and aEC-specific conditional Cavi-mutant (BMX°"";R26'"#*!";n=5 mic 
138 arterioles) mice. f, mmunostaining on adult retinas for ECs (isolectin, 
green), SMCs (SMA, magenta) and MESD2A (white) fromcontroland aEC 
specific MFSD2A-overexpression mice. Independent replications for fwere 
performed on three mice per genotype. Statistical significance was 
determined bya nested unpaired, two-tailed ctest fore, d. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


“The exact sample size (n) for each experimental group/candition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section, 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value nated 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g, Cohen's d, Pearson's r), indicating how they were calculated 


(Our web collection on statistics for biologists contains articles an many of the polnts above 


Software and code 
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The avascular nature of cartilage makesit a unique tissue", but whether and how the 
absence of nutrient supply regulates chondrogenesis remain unknown. Here we show 
that obstruction of vascular invasion during bone healing favours chondrogenic over 
osteogenic differentiation of skeletal progenitor cells. Unexpectedly, this process is 
driven by a decreased availability of extracellular lipids. When lipids are scarce, 
skeletal progenitors activate forkhead box O (FOXO) transcription factors, which bind 
to the Sox9 promoter and increase its expression. Besides initiating chondrogenesis, 
SOX9 acts asa regulator of cellular metabolism by suppressing oxidation of fatty 

acids, and thus adapts the cells to an avascular life. Our results define lipid scarcity as 
an important determinant of chondrogenic commitment, reveal a role for FOXO 
transcription factors during lipid starvation, and identify SOX9 as acritical metabolic 
mediator. These data highlight the importance of the nutritional microenvironment 
in the specification of skeletal cell fate. 


Bone repair reiterates the developmental endochondral ossification 
process ands initiated by periosteal skeletal progenitor cells that first 
formanavascular cartilage template which is later replaced by bone'*. 
Among the factors involved in chondrogenesis, the transcription fac- 
tor SOX9 has been the mostextensively studied, but howitis induced 
in skeletal progenitor cells is poorly understood, Since cartilage is 
avascular, the absence of blood vessels itself has been suggested to 
initiate chondrogenesis ®, but a causal link has not been confirmed 
and remains controversial’. In this study, we provide evidence that 
local blood vessel availability determines skeletal progenitor cell fate 
during bone healing througha multifaceted mechanism involving lipid 
metabolism, FOXO signallingand SOX9. 


Vascularity controls skeletal cell fate 

To investigate whether the absence of vasculature determines skel- 
etal progenitor fate we transplanted viable (autologous) bonegrafts 
into femoral defects in mice, inducing a periosteal-driven healing 


response’. Periosteal progenitor cells near the host-graft border 
formed cartilage, whereas cells in the centre differentiated directly 
into bone-forming osteoblasts (Extended Data Fig. 1a). Periosteal 
cells did not contribute to blood vessels in the callus (Fig. 1a), but 
actively promoted vascular ingrowth as their removal reduced bone 
formation and callus vascularization (Extended Data Fig. 1b-d). At 
post-fracture day (PFD) 7 the central periosteal callus vasculature 
was highly connected with that of the surrounding muscle (Fig. 1b), 
suggesting that periosteal cells attract blood vessels from thissite. To 
investigate the importance of this vascular ingrowth for bone repair, 
we inserted polycarbonate filters with different pore sizes between 
graftand muscle (Fig. 1c). Inserting a 30-um-pore filter still allowed 
capillaries to transverse the pores at PFD7, whereas a 0.2-ym-pore 
size prevented vascular ingrowth into the periosteal layer, as shown 
by the numerous capillaries adjacent to the filter on the muscle side 
and reduced callus vascularization (Fig. 1d). Concomitantly, periosteal 
cellularity decreased because of reduced proliferation and moderately 
increased cell death (Extended Data Fig. 2a, b), butmore importantly, 
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Fig.1/Preventing vascular ingrowth during bone healing induces 
chondrogenesis. a, Immunofluorescence analysis of bone-graft periosteal cell 
tracingshowing contribution to cartilage and bone (arrows, GFP" osteoblasts; 
arrowheads, GFP’ osteocytes) in the graft callusat PFD14, while CD31‘ blood 
vessels (red) are mainly host-derived (representative images of 4 mice).Scale 
bars, 50 um. b, Immunofluorescence analysis of abone-autograft section 
revealing the interconnected periosteal callus and skeletal muscle vasculature 
at PFD7 (representative image of 3mice).Scale bar, 200 um, ¢, Schematic 
representation of the autograft model with filter. d, Immunohistochemical 
analysisand quantification of callus vascularization at PFD7 when afilter with 
30 um (filter30; arrows indicate blood vessels passing through filter pores) or 
0.2 um (filter 0.2) pore size was placed inbetween muscleand graft (n=4 mice 
for control and filter 30,n=5 mice for filter 0.2).Scale bars, 50 umn detail 
images, otherwise 200 m.e, Visualization and quantification of early 
chondrogenic cellsin the callusof grafts with or without afilter (0.2m) at 
PED7byimmunofluorescence for SOX9 (n=7 mice). Scale bars, SOjim. 

£, Visualizationand quantification of cartilage in the callus of autografts with 
and without filter (0.2 um) at PFDI4 by immunofluorescence for collagentype2 
(COL2)(n=4 mice for control, n=6mice for filter 0.2).Scale bars, 500 um. 

b, bone; c, cartilage; filter; g, graft:h, host; m, muscle; pe, periosteal callus. 
Dataare mean-+s.e.m.;one-way ANOVA with Bonferroni posthoc test (d),two- 
tailed Student’sctest e,f). 


thenumber of SOX9" early chondrogenic cells was higher at the central 
graft region (Fig. le). This chondrogenic switch resulted in less bone 
(Extended Data Fig. 2c) but more type 2 collagen (COL2)' cartilage 
matrix in the central region at PFD14 (Fig. 1f), where graft cells dif- 
ferentiated to chondrocytes instead of osteoblasts (Extended Data 
Fig. 2d). At PFD28, successful healing was observed inboth conditions, 
although the presence of small cartilage islandsin the callus with filter 
(75.0 #14.4% of sections) suggests delayed healing (Extended Data 
Fig.2e, f). Thus, skeletal progenitor cells undergo chondrogenicrather 
than osteogenic differentiation when blood supply is limited, securing 
successful bone healing. 

During bone healing, the vasculature supplies nutrients (oxygen, 
glucose, amino acids and lipids), growth factors and perivascular 
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progenitor cells’. To distinguish between these components, we 
applied a computational model of bone healing” to our bone- 
graft setup, in which cell fate and tissue formation are controlled 
by nutrient availability, osteochondrogenic growth factors, matrix 
density and local cell number (Extended Data Fig. 3a, b). The model 
correctly described the spatiotemporal progression of normal bone- 
graft healing (thatis, blood vessels can come from the muscle; com- 
pare Extended Data Fig. 3b with Extended Data Fig. 1a). When the 
presence of a filter was mimicked by limiting diffusion of nutrients 
from the muscle side (20-40% of the nutrients normally supplied by 
the vasculature), the model recapitulated the chondrogenic switch 
in the central graft region (Extended Data Fig. 3c, d). An additional 
supply of growth factors and/or progenitor cells from the muscle 
side did not significantly affect this bone repair profile (Extended 
Data Fig. 3e). The in silico model thus supports the hypothesis that 
nutrients supplied by the vasculature regulate skeletal progenitor 
cell differentiation. 


Lipid scarcity induces chondrogenesis 

To test this hypothesis, we investigated the nutritional control of cell 
fate using two models of skeletal progenitors: the C3HIOT1/2 cell line, 
ahomogeneous population retaining multipotency properties”, and 
primary murine periosteal cells, which are more heterogeneous but 
contain true skeletal stem and progenitor cells". We confirmed key 
findings in immunophenotypically-defined skeletal stem cells isolated 
from total long bones of newborn mice”, which are homogeneous but 
limited innumber. 

Combined nutrient deprivation (CND; reduced levels of serum, 
oxygen, glucose and glutamine) increased SOX9 protein and mRNA 
levels in C3H10TI/2 or periosteal cells, without changesin expression 
of osteogenic, adipogenic or myogenic transcription factors (Fig.2a, 
Extended DataFig. 4a-c), Depriving C3HIOTI/2 cells of individual nutri: 
ents revealed that low oxygen levels increased SOX9, as reported", 
whereas lowering glucose or glutamine levels had little effect 
(Fig. 2b). Unexpectedly, serum deprivation led to massive and rapid 
accumulation of SOX9 mRNA and protein, resulting from increased 
transcription and translation (Fig. 2b, Extended Data Fig. 4d-g). 
Expression of osteogenic, adipogenic and myogenic transcription 
factors did not change (Extended Data Fig. 4h). Serum deprivation 
also increased SOX9 in periosteal cells (Extended Data Fig. 4i) and 
enhanced their chondrogenic differentiation in micromass cultures 
(Fig. 2c), but prevented osteogenic differentiation (Extended Data 
Fig. 4j). A possible explanation for this chondrogenic switch is avoiding 
cell death. Indeed, knockdown of SOX9 in C3H10T1/2cells, periosteal 
cells and growth plate-derived chondrocytes reduced cell viability in 
CND,and toa minor extentalso inserum deprivation (Extended Data 
Fig. 4k, !). Thus, skeletal progenitor cells rapidly adapt to specific 
nutritional stress by increasing SOX9 levels and undergoing chon- 
drogenic commitment. 

Serum represents the main source of lipids, and we questioned 
whether serum deprivation-induced chondrogenic commitment of 
skeletal progenitors could be attributed to lipid deprivation. Resup- 
plying C3H10T1/2 cells with oleate (Fig. 2d), palmitate, very low density 
lipoproteins or polyunsaturated fatty acids (PUFA) (Extended Data 
Fig, Sa-c) prevented the increase in SOX9 during serum deprivation. 
Inaddition, lipid-reduced serum (LRS) mimicked the effects of serum 
deprivation. LRS increased SOX9 levels in C3H10T1/2 cells (Fig. 2e), 
promoted chondrogenic differentiation of periosteal cellsin micromass 
orpelletcultures, an effect partially reversed by exogenous fatty acids 
(Fig. 2f, Extended Data Fig. 5d), and inhibited their osteogenic differ- 
entiation (Extended Data Fig. Se). Importantly, serum deprivation or 
LRS also increased SOX9 levels in skeletal stem cells (Extended Data 
Fig. Sf). Inall studied cell types, lipid deprivation increased the number 
of SOX9"*" cells, and cell cycle and apoptosis analysis showed this was 
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Fig.2|Lipid scarcity induces SOX9inskeletal progenitors.a,b, Immunoblot 
detection of total SOX9 in C3HIOTI/2cells exposed for 24 h to control (C) or 
CND medium (a) or to different nutritional tresses (FBS, fetal bovine serum; 
Gle, glucose; Gin, glutamine) (b), with B-actin as loading control (n=2 
independent experiments).¢, Chondrogenic differentiation of periosteal cells 
incontrol or serum deprivation (SD) medium, assessed by visualization of 
chondrogenic matrix deposition (alcian blue staining) and quantification of 
Sox9, Col2al and AcanmRNA levels (relative to Actb, n=6 biologically 
independent samples).d, e, Immunoblot detection of total SOX9 in C3HIOTI/2 
cells exposed for 6 h tocontrolmedium, SD medium, SD medium 
supplemented with increasing concentrations of oleate (4) or LRS medium (e), 
with B-actinasloading control (n=2independent experiments). 

£, Chondrogenic differentiation of periosteal cellsin control medium, LRS 


not due to selection of a pre-existing SOX9"*" population (Extended 
Data Fig. Sf-h). 

Wenext tested whether lipid availability also controls skeletal pro- 
genitor differentiation in more physiologically relevant settings. Since 
itisnot feasible to locally deprive cells specifically of exogenous lipids 
in vivo, we first used embryonic metatarsal cultures, an organ-like 
ex vivo model of bone development. Serum deprivation increased 
the number of SOX9" chondrocytes and prevented osteogenesis, 
evidenced by absence of Collal-expressing cells and mineralization, 
which was reversed by fatty acid supplementation (Extended Data 
Fi Second, local injection of fatty acids during fracture repair 
reduced the amount of cartilage in the callus, with no change in newly 
formed bone (Extended Data Fig. Sk). Third, GW9S08, an agonist of 
free fatty acid receptor 1 (FFARI) and FFAR4, prevented the increase 
inSOX9 induced by serum deprivation or LRSin the three cell models, 
(Fig. 2g; Extended Data Fig. s!). Accordingly, locally injecting GW9SO8 
during fracture repair decreased cartilage in the callus without affect- 
ing woven boneareas (Fig. 2h). Together, low local lipid levels promote 
chondrogenesis of skeletal progenitor cellsin vivo. 

‘Our findings suggest that the chondrogenic switch during bone- 
graft healing in the presence of filter (Fig. 1) is primarily due to the 
absence of exogenous lipids, whichis linked to poor vascularization. 
We found that diffusion of ipidsin acollagen gel containing periosteal 
cellsismuch lower than that of glucose (Extended Data Fig. 5m), indi- 
cating that lipids area limiting nutrient when vascularization isinad- 
equate. Furthermore, weshowed that the absence of specific ell types, 
potentially blocked by the filter, does not impact chondrogenesis. 
Indeed, serum deprivation-supported chondrogenic differentiation 
of periosteal cells in micromass cultures was not prevented by mus- 
cle-derived endothelial cells, macrophages or pericytes, in contrast 
to fatty acid supplementation. (Extended Data Fig. 5n, 0). Together 
with our in vivo (Fig. 1) and in silico (Extended Data Fig. 3) results, 
this shows that lipid deprivation caused by reduced vascularization 
is probably an important determinant of periosteal chondrogenesis 
during bone healing. 
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‘medium, SD medium or SD medium supplemented with 60 wMoleate (OL), 
assessed by alcian blue stainingand quantification of Col2al and AcanmRNA 
levels (relative to Actb, n= 6 biologically independent samples). Veh, vehicle. 
g, Flow cytometric quantification of total SOX9 levelsin periosteal cells, 
exposed for24 hto control medium, SD medium orLRS medium supplemented 
‘with 100 1M GW9508 (FFARI/4 agonist) or vehicle (DMSO) (n=3 biologically 
independent samples). h, Histological visualization (safranin O staining) and 
quantification of cartilageand woven bone in the callus at PFD7 of mice treated 
daily with GW9508 (10 nmol) or vehicle (0.2% DMSO insaline) tthe fracture 
site (n=5 mice).Scalebars, 500 um. Dataare mean +s.e.m.;two-tailed 

Student's ctest (c,h), one-way ANOVA (f) or two-way ANOVA (g) with 
Bonferroni post hoc test. For gel source data, see Supplementary Fig. 1. 


Chondrocytes have low fatty acid oxidation 

Why would chondrogenic commitment be beneficial when lipidsare 
scarce? We hypothesized that chondrocyte metabolism does not rely 
on exogenous lipids. To test this, we compared the metabolic profile 
of chondrocytes to that of skeletal progenitors and matureosteoblasts 
(Fig. 3a, Extended Data Fig. 6a). Chondrocytes were highly glycolytic, 
as reported”, Osteoblasts showed the highest oxygen consumption 
rate (OCR), which was not owing to high glucose oxidation, but toa 
higher rate of fatty acid oxidation (FAO). Chondrocytes exhibited low 
FAO and skeletal progenitors had an intermediate profile. To confirm 
these findings in vivo, we examined metabolic-gene signatures ina 
mouse long-bonesingle-cell RNA-sequencing (RNA-seq) dataset that 
we generated recently”. Thisatlas encompasses 17 non-haematopoietic 
cell types including skeletal progenitors, chondrocytes andosteoblasts 
(Extended DataFig. 6b). The differentchondrocyte populations (clus- 
ters2,10,13, 17;Sox9'Acan’) showed low expression of FAO genes and 
high expression of glycolytic genes compared with osteoblasts (clus- 
ters 7and 8; Collar’ Ocn’; Ocnis also known as Bglap) and, to a minor 
extent, skeletal progenitors (clusters1 and 4; Grem!’) (Extended Data 
Fig. 6b, c). Gene expression analysis confirmed higher expression of 
the glycolytic genes Sic2al (encoding GLUT1), Pfkfb3 and Ldha, but 
lower expression of the FAO-related genes Cptia, Acadm and Acad in 
growth plate cartilage versus cortical bone samples (Extended Data 
Fig. 6d). Immunohistochemistry showed low CPTla levels and high 
GLUTLlevels in chondrocytes of the growth plate and fracture callus, 
whereas trabecular bone osteoblasts displayed high levels of both 
CPTlaand GLUT] (Fig. 3b). Intravenous injection of fluorescent fatty 
acid and glucose analogues revealed labelled fatty acidsin osteoblasts 
butnotinchondrocytesin thegrowth plate or fracture callus, whereas 
labelled glucose was taken up by both cell types (Extended Data Fig. 6e, 
£), confirming thatlow FAO in chondrocytes correlates with lipid scar- 
city. Transplantation experiments showed that loss of CPTlaabrogates 
osteogenic differentiation of skeletal stem cells during fracture healing 
but preserves their ability to become chondrocytes (Fig. 3c, Extended 
Data Fig. 6g).Inaddition, etomoxir, aCPT1 inhibitor, decreased viability 
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Fig.3|SOX9suppresses FAO inchondrocytes.a, Quantificationof glucose 
consumptionand|actate secretion (PC, COB:n=6; GCH: n= biologically 
independent samples), glycolytic rate (n=3 biologically independent samples), 
‘oxygen consumption (PC, COB: =7; GCH:n=5 biologically independent 
samples), glucose oxidation (n=3 biologically independent samples) and 
palmitate oxidation (n=3 biologically independent samples)in periosteal 
cells (PC), growth plate-derived chondrocytes (GCH) and calvarial osteoblasts 
(COB).b, Analysis ofadjacent histological sections of agrowth plate 

and fracture callus (PFD7) by safranin O staining (cartilage) or 
immunofluorescence for CPT1a or GLUTI (representativeimages of 3mice). 
Scale bars, 100 1m. Dotted white lines delineate cartilage areas.c, Histological 
visualization and quantification of early chondrogenic (SOX9*)and osteogenic 
(COL1’)cellsin the callus of fractures (PFD7) transplanted with CAG-DsRed’ 
skeletal stem cells (SSC) transduced with shRNA against Cptla(shCptla) or 
scrambled shRNA control (shScr) (n=3mice). Scale bars, 50 wm. 


and numbers of cultured calvarial osteoblasts but not growth plate- 
derived chondrocytes (Extended Data Fig. 6h). Thus, chondrocytes 
exhibita low rate of FAO consistent with local lipid scarcity, and donot 
depend on this pathway to fulfil their metabolic demands. 


SOX9 suppresses FAO 

We next determined how lipid deprivation affects the rate of FAQ in 
skeletal progenitor cells. As expected, oxidation of extracellular pal- 
mitate immediately dropped after exposing periosteal cells to serum 
deprivation or LRS (Fig. 3d, Extended Data Fig. 7a). Surprisingly, cells 
temporarily maintained total FAO, which was quantified indirectly 
by measuring etomoxir-sensitive OCR”, for 6 h after serum depri- 
vation (Fig. 3e, Extended Data Fig. 7b), suggesting that they initially 
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d, Measurement of oxidation of extracellularly added palmitate by periosteal 
cellsin control medium or at different times in SD medium (n=3biologically 
independentsamples).e, Quantification of FAO-linked OCRin periosteal cells 
incontrol medium or atdifferent times in SD medium (3h:n=2, other time 
points: =3 biologically independent samples)., Quantification of FAO-linked 
OCRin periosteal cells transduced with shSox9 or shScrin control medium or 
at different timesin SD medium (shScr 12h, shSox9 control, shSox93 h: n=5;all 
others: n= 6 biologically independentsamples).g, Quantification of FAO- 
linked OCR in GCH transduced with shSox9 orshScr (n=Sbiologically 
independent samples). h, Quantification of palmitate oxidation in GCH 
transduced with shSox9 or shScr, and in COB transduced withalentiviral vector 
encoding SOX9 (SOX9°*, SOX9 overexpression) oran empty vector (EV)(n=4 
biologically independentsamples). Dataare mean s.e.m.; one-way ANOVA 
{a,d,e) or two-way ANOVA (f) with Bonferroni post hoctest, two-tailed 
Student's ctest(c,g,h). 


compensate for the scarcity of exogenous lipids, possibly through 
mobilization of intracellular lipid stores. Indeed, fluorescent fatty 
acids translocated from lipid droplets into mitochondria, where FAO 
takes place, when periosteal cells were exposed to serum deprivation 
(Extended Data Fig. 7c). Starvation-induced lipid-droplet generation 
and breakdown are both linked to autophagy”™™, and we confirmed 
that C3H10T1/2 cells and periosteal cells activate autophagy early 
after serum deprivation (Extended Data Fig. 7d-f). Accordingly, 
lipid-droplet number initially increased during serum deprivation 
in C3H10T1/2 cells before decreasing at 6 h, and knockdown of the 
essential autophagosome protein ATGS™ prevented both the initial 
increase and the late breakdown of lipid droplets after serum depriva- 
tion (Extended Data Fig. 7g). Furthermore, the lysosomotropic drug 
chloroquineimmediately reduced the FAO-linked OCR upon exposure 
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Fig.4 | Lipids regulate SOX9 through FOXO signalling. a,b, Volcano plot 
showing significantly enriched and depleted mRNAs (a) and top-10 most- 
significantly enriched transcription factor motifs with normalized enrichment 
scores (NES) as determined by -cisTarget analysis (b) in C3HIOT1/2.cells 
exposed for 1hto SD medium versus control medium (n=3 replicates). Motif 
shownontopisthe FOXO/forkhead motif.¢, Confocal microscopy of 
C3HLOTI/2.cells stained for FOXO1 (top) or FOXO3a (bottom) shows increased 
nuclear localization after exposure of cells for 3htoSD medium or LRS medium 
(representative images of independent experiments).Scalebars,20 ym. 

d, Immunoblot detection of nuclear FOXOLand FOXO3a in C3H10T1/2.cells 
exposed for 1, 30r6 ho control medium or SD medium, with lamin A/Cas 
loading control (n=2independentexperiments).e, Nuclear FOXO activity in 
C3HLOT1/2cells exposed for3hto control medium, SD medium or LRS medium 
supplementedwith vehicle (EtOH), oleate (60 uM) or PUFA (n=3 independent 
experiments). f, Occupancy of FOXO3a at the Sox9 promoter of Cas9- 
expressing C3H10T1/2cells transduced withinducible short guide RNA 
(sgRNA) against FoxO! (sgFoxol),sgFoxo3a or ascrambledsgRNA (sgScr), 
exposed for 3h to control medium or SD mediumin the presence of 
doxycycline (250 ng mt"), as determinedby chromatin immunoprecipitation 
with quantitative PCR (ChIP-qPCR) (n=3 independent experiments).g, Flow 
cytometric quantification of total SOX9 levels in periosteal cells exposed for 


of periosteal cells to serum deprivation (Extended Data Fig. 7h) and 
decreased survival of C3H10T1/2 cells and periosteal cells duringserum 
deprivation (Extended Data Fig. 7i). Together, these data show that 
skeletal progenitors depend on lysosome-mediated mobilization of 
intracellular lipid stores to temporarily support FAO and securesurvival 
when extracellular lipids become limited. 

The increase in SOX9 levels (Extended Data Fig. 4d, e) and the 
decrease in total FAO (Fig. 3e) occur concomitantly after lipid dep- 
rivation, suggesting that they are connected. Deletion of SOX9 in 
periosteal cells prevented the suppression of FAO by serum depriva- 
tion (Fig. 3f), whereas inhibition of FAO with etomoxir did not alter 
SOX9 levels (Extended Data Fig. 7). Moreover, knockdown of SOX9 in 
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24htocontrol medium, SD medium or LS mediumsupplemented with 1 yM 
ASI842856 (FOXO inhibitor) or vehicle (MSO) (n=4 biologically independent 
samples).h, Histological visualization and quantification of FOXO3a- 
expressing cellsin the central periosteal callus of grafts with or withouta filter 
(0.2,um poresize) at PED7 (control: n=7, filter 0.2:n=8 mice). Scalebars, 

50 jim.i, Histological visualization (safranin O staining) and quantification of 
cartilage and woven bonein the callusat PFD7 of mice treated daily with 
AS1842856 (500 pmol) or vehicle (0.1%DMSO in saline) atthe fracture site 
(ehicle: n=4, ASI842856: n=Smice).Scalebars,500 um.j, Schematic 
overview of main findings. During bone fracture healing, skeletal progenitor 
cells in adequately vascularized environments differentiate into osteoblasts, 
which require high levels of exogenous fatty acids (FA) to fuel their FAO- 
dependent metabolism. Cellsin regions with a poor vascular supply will 
temporarily support FAO by breaking down intracellular lipid droplets (LD), 
while the lack of dietary lipids simultaneously increases FOXO activity. FOXOs 
increase levels of SOX9, whichactivates the chondrogenic programand blocks 
FAO. Theseadaptations promote cell survivaland secure bone healingevenin 
nutrient:poor environments. Data aremean+s.e.m.; two-way ANOVA with 
Bonferroni post hoctest (e-g), two-tailed Student's test (h, ). For gelsource 
data, see Supplementary Fig. 1. 


growth plate-derived chondrocytes induced not only loss of typical 
chondrocyte characteristics such as cobblestone-like morphology 
and expression of Col2al and Acan (Extended Data Fig. 7k, 1), butalso 
increased expression of Cptlaand Acad (Extended Data Fig. 7I) and the 
rate of FAO in chondrocytes (Fig. 3g, h). By contrast, overexpression of 
SOX9 in calvarial osteoblasts decreased FAO (Fig. 3h). SOX9 thus acts 
as a metabolic regulator in chondrogenic cells by suppressing FAO. 


FOXOsinduce SOX9 uponlipid starvation 


We next examined how lipids regulate SOX9 levels. Transcriptomics 
showed robust upregulation of Sox9 expression in C3H10TI/2 cells 
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starting Ihafter serum deprivation and increased expression of several 
other, butnotall, chondrogenic markers from3h onwards (Extended 
Data Fig. 8a). Differential expression analysis showed that 678 (1h), 
4,022 3h)and 3,811 (6h) genes were significantly upregulated by serum 
deprivation, including Sox9 as one of the top hits at all time points 
(Fig. 4a, Extended Data Fig. 8b). A total of 757 (1h), 2,167 (3h) and3,872 
(6h) genes were significantly downregulated, including genesassoci- 
ated with proliferation (Egr3, DuspS and Errfil), skeletal stemcells (Nes 
and /tgaS) and osteogenesis (Spp1 and Adam19) (Fig. 4a;Extended Data 
Fig. 8b). Transcription factor-binding-motif analysis” of the top-100 
overexpressed genes at each time point showed strong enrichment 
of the FOXO/forkhead motif (Fig. 4b, Extended Data Fig. 8c). We con- 
firmed that serum deprivation increases nuclear FOXO1and FOXO3a 
in C3H1OTI/2 cells (Fig. 4c, d) and active FOXO levels in C3H10T1/2 
and skeletal stem cells, an effect prevented by exogenous fatty acids 
(Fig. 4e, Extended Data Fig. 8d-f), indicating that extracellular lipids 
control FOXO activity. More specifically, FOXO1 and FOXO3a showed 
increased binding to the Sox9 promoter during serum deprivation 
(Fig. 4f, Extended Data Fig. 8g), and the FOXO inhibitor AS1842856 
prevented induction of SOX9 during lipid deprivation in ll cell types 
(Fig. 4g, Extended Data Fig. 8h). Similar results were obtained using 
a CRISPR-Cas9 approach to conditionally delete Foxol and Foxo3a 
(also known as Foxo3) in C3H10T1/2cells, or using shorthairpin RNAs 
(shRNAs) in skeletal stem cells (Extended Data Fig. 8i,j). These data 
demonstrate that FOXOs directly control Sox9 transcription during 
lipid deprivation. 

Wenext confirmed the relation between lipid deprivation, FOXOs 
and SOX9 during bone healing. First, the presence of the filter 
(0.2 um) during bone-graft healing increased the number of cells 
positive for nuclear FOXO3ain the central periosteal region (Fig. 4h), 
similar to the increase in SOX9" cells (Fig. le). Second, stimulation 
of fatty acid signalling using the FFARI/4 agonist GW9508 during 
fracture healing strongly reduced the number of FOXO3a" nuclei 
in the periosteal callus (Extended Data Fig. 8k), correlating with 
reduced amounts of cartilage (Fig. 2h). Third, skeletal stem cells, 
with FOXO1 and FOXO3 inactivation failed to engraftinto tibial frac- 
tures (Extended Data Fig. 81), which may be owing to their inability 
toincrease SOX9 levels upon lipid deprivation, or ageneral failure to 
survive transplantation-associated stress. Finally, local daily injection 
ofthe FOXO inhibitor AS1842856 during fracture healing reduced the 
amount of cartilage while not affecting new bone formation (Fig. 4i). 
Thus, FOXO signalling in vivo is negatively regulated by lipid avail- 
ability and is required for skeletal progenitor cell chondrogenesis 
and survival during bone healing. 


Discussion 

On the basis of our findings, we propose a model in which the local 
vasculature, through supply of lipids, influences skeletal progenitor 
differentiation during fracture healing (Fig. 4j). Cells close to blood 
vessels become osteoblasts, which depend on FAO to support their 
metabolic demands. Skeletal progenitorsin poorly vascularized regions 
sustain FAO fora short time by mobilizing intracellular lipidstoresand 
thenactivate FOXO signalling as a result of exogenous lipid starvation. 
Nuclear localization of FOXOs promotes expression of SOX9, which 
induces chondrogenic commitmentand suppresses FAO to allowlong- 
term cell survival. 

Low lipid levels are thus the main nutritional determinant for chon- 
drogeniccommitment of skeletal progenitor cells, rather than lack of 
oxygen or glucose” although growth factorsare indispensable to 
activate the full chondrogenic-differentiation program'®. In contrast 
to osteoblasts”””, we find that chondrocytes are largely independ- 
ent of FAO, consistent with poor diffusion of fatty acids in cartilage 
tissue. This metabolic independence from extracellular lipids would 
therefore be beneficial in the avascular cartilage environment. FAQ in 
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chondrocytes is suppressed by SOX9, attributing a novel metabolic 
regulatory roleto this transcription factor. Mechanistically, reduced 
lipid availability is translated into SOX9 production through FOXOs, 
well-known regulators of the cellular response to metabolic stress”. 
We propose lipid starvation as an additional trigger for FOXO activa- 
tion, although the full signalling cascade and exact lipid sensor remain 
unknown. Ofinterest, osteoarthritis is associated with increased angio- 
genesis and FAO" but reduced SOX9 levels and FOXO activity”. 
Ourresultsshowthatall these phenomenamay beconnected to local 
lipid availability, suggesting that manipulation of lipid metabolism 
could be of therapeutic interest. More generally, our findings show 
that local nutrient levels can decide stem-cell lineage choice through 
direct transcriptional changes. Asa consequence, the metabolic profile 
ofamature cell may reflect microenvironmental constraintsas much 
as particular cellular needs. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized, unless otherwise mentioned. The 
investigators were not blinded to allocation during experiments and 
outcome assessment. 


Mice 
CS7BL/6J mice, 129/Sv mice (Janvier Labs), B6.Cg-Tg(CAG-eGFP) 
mice™, Bé.Cg-Tg(Collal-cre/ERT2,-DsRed)ISmkm/J mice”, B6;129S4- 
Sox9tmL.1Tlu/) mice and B6.Cg-Tg(CAG-DsRed*MST)INagy/ mice 
(The Jackson Laboratory) were used in this study. Unless otherwise 
specified, both male and female mice were used for all experiments. 
Allanimal experiments were conducted according to the regulations 
and with approval of the Animal Ethics Committee of the KU Leuven. 


Mouse bone-healingmodels 

The femoral segmental bone-graft model was adapted from a previ- 
ously described model*. Eight- to ten-week old male C57BL/6) mice 
were anaesthetized with aketamine-xylazine mixture (100mg per kg 
ketamine and 15 mg per kg xylazine) and the right femur was exposed. 
Amid-diaphyseal 4-mm bone segment was excised with a6.5-mm di 
mond saw disk (Codema), briefly washed in saline to remove the bone 
marrow (periosteum not removed) and the segment was subsequently 
reimplanted in the defect (autograft). To investigate the contribution 
of donor cells, grafts were isolated from CAG-eGFP mice (periosteum 
not removed) and transplanted in wild-type littermates. To obtain 
devitalized allografts, 4-mm bonesegments were isolated from 129/Sv 
mice, washed insalineto remove the bone marrow, scraped to remove 
the periosteum, sterilized in 70% ethanol and frozen at -80°C for at 
least week. After graftimplantation, the defectwas stabilized withan 
intramedullary metal pin (22 gauge needle). To createacompromised 
host environment, a polycarbonate filter with a pore size of 30 ym or 
0.2 um (Sterlitech) was inserted between the muscle and the graft at 
the time of surgery. 

The tibial fracture healing model was performed as previously 
described’. For studies with the FFARI/4 agonist GW9508 mice were 
treated daily by subcutaneously injecting 50 11 of a 200 uM GW9508 
(Cayman Chemical) solution or vehicle (0.2% DMSO in saline) at the 
fracture site. For fatty acid delivery, mice were treated daily by subcu- 
taneously injecting 20 pl corn oil (Sigma) or control solution (saline) 
at the fracture site. For studies with the FOXO inhibitor AS1842856, 
micewere treated daily by subcutaneously injecting 50 yl of a10 1M 
AS1842856 (Calbiochem) solution or vehicle (0.1% DMSO in saline) 
at the fracture site. For metabolite labelling experiments mice were 
injected intravenously with the fluorescent fatty acid analogue BODIPY 
558/568 C12 (Red-C2; Invitrogen) at 1 mg per kg body weightand the 
fluorescent glucoseanalogue 2(N-(7-nitrobenz-2-oxa-1,3-diazol-4-yl) 
amino)-2-deoxyglucose (2-NBDG; Invitrogen) at 12.5 mg per kg body 
weight, 15 min before euthanasia. For skeletal stem cell transplanta- 
tions, 100,000 cells (shCptia experiments) or 20,000 cells (shFoxol 
and shFoxo3a experiments) were resuspended in 5 il of aS mg mI 
collagen gel (rat tail collagen type, Corning) and transplanted at the 
fracture site at the time of surgery. 


Micro-computed tomography analysis 

Mice were euthanised at2 or 4 weeks after surgery and grafted bones 
were isolated. For boneanalysis, samples werescanned usingthe high 
resolution SkyScan 1172 micro-computed tomography (microCT) sys- 
tem (Bruker-microCT) ata pixel size of 10 ym with 50 kV tube voltage 
and 0.5mmaluminium filter. To reduce the metal artefacts induced by 
the presence of the intramedullary pin, microCT projection data was 
reconstructed using an iterative reconstruction technique and projec- 
tion completion”. Custom software was made in MeVisLab (MeVis Medi- 
cal Solutions) to visualize and analyse the obtained microCT images. 


The boundary between graft and callus was manually delineated and 
mineralized tissue was segmented using hysteresis thresholding. For 
visual representation grafts arerepresented ina different colour to cal- 
lusand host bone. The coverage ratio was calculated as the percentage 
of the graft surface that is covered with callus by determining whether 
the normal ine tothe graftsurface encounters mineralized callus, for 
each point of the graft surface. 

For visualization and quantification of the vasculature, mice were 
anaesthetized witha ketamine-xylazine-heparin mixture (100mg perkg 
ketamine, 15mg per kg xylazine and 1,000 U per kg heparin) and suc- 
cessively perfused with 10 ml of heparinized saline (100 U mt"), 10 ml 
ofa 10% neutral-buffered formalin solution, 10 ml of saline and 5 ml 
ofa preheated 30% barium sulfate solution (Micropaque, Guerbet) 
containing 2% gelatine. After perfusion, animals were placed on ice 
for at least 1h and subsequently kept at 4 °C overnight to allow the 
gelatine to solidify, before removing the grafted hindlimbs for dual 
energy microCT analysis*””. Two microCT scans of eachsample were 
taken on the SkyScan 1172microCT system with effective beam energy 
below (50 kV tube voltage with 0.5 mm aluminium filter) and above 
(100 kV tube voltage with 0.5 mm aluminium and 0.038 mm copper 
filter) the K-edge energy of barium sulfate, both with animage pixel size 
ofS pm. By combining the low and high energy acquisitions, animage 
of the (barium sulfate-perfused) vasculature only was reconstructed 
as described’*” anda segmentation of the vasculature was obtained 
by thresholding this image. A segmentation of the bone was obtained 
by thresholding the boneand vasculature out of the low energy recon- 
struction and removing the calculated vasculature from it. After delin- 
eatinga250-tm-wide region of interest around the graft surface using 
acustom made MeVisLab software package, calculation of thenumber 
of blood vessels and the average vessel thickness was performed using 
the CTAn software (Bruker-microCT). 


Immunohistochemistry 

To isolate bones for histological analysis, mice were anaesthetized 
with ketamine-xylazine-heparinand perfused with 10 ml of heparin- 
ized saline followed by 10 ml of 2% paraformaldehydein PBS. Isolated 
bones were further fixed in 2% paraformaldehyde overnight and 
decalcified in EDTA for 14 days at 4 °C. Samples were either embed- 
ded in paraffin and sectioned at 4 um, embedded in agarose for 
vibratome sections (100 um thick) or embedded in NEG-S0 frozensec- 
tion medium (Richard-Allen Scientific) and sectioned at 7 pm using 
the CryoJane Tape-Transfer System (Leica) for samples containing 
fluorescent protein-expressing cells. Staining with haematoxylin and 
eosin (H&E) and safranin O, terminal deoxynucleotidyl transferase 
dUTP nick end labelling (TUNEL) staining and immunohistochemical 
staining for BrdU, CD31 and COL2 are routinely used in our labora- 
toryand have all been described previously”. For SOX9, COLI, 
CPT1a, GLUT1 and FOXO3aimmunohistochemical staining, sections 
were deparaffinised and blocked for 30 min in 0.1M Tris-HCI, 0.15M 
NaCl, pH 7.6 (TNT) with 0.5% Blocking Reagent (NEN, PerkinElmer) 
and 20% normal goat serum (DAKO). Subsequently, sections were 
incubated overnight with arabbit anti-SOX9 primary antibody (Novus 
Biologicals; NBP1-855S1; 1:100), rabbit anti-COLI primary antibody 
(Novus Biologicals; NB600-408; 1:100), rabbit anti-CPTla primary 
antibody (Cell Signaling Technology; no. 12252; 1:50), rabbit anti- 
GLUT1 primary antibody (Cell Signaling Technology; no. 12939;1:100) 
or rabbit anti-FOXO3a primary antibody (Cell Signaling Technol- 
ogy, no. 2497,1:100) diluted in TNT with 0.5% blocking reagent, fol- 
lowed by three washes with TNT containing 0.05% Tween-20. Next, 
slides were incubated for 1h with an Alexa Fluor 546- or Alexa Fluor 
488-conjugated goat anti-rabbit secondary antibody (Invitrogen; 
A-11010 and A-11034) diluted 1:200 in TNT/0.5% blocking reagent, 
washed and counterstained with Hoechst 33342 (20 1g ml” in PBS; 
Invitrogen). Stainings omitting the primary antibody were used as 
negative controls. 


Images were taken on a Zeiss Axioplan 2 light microscope, Zeiss 
LSMSI0-META NLO multi-photon confocal microscope or Zeiss 
LSM880 confocal laser scanning microscope. Histomorphometry 
was performed using the Zeiss AxioVision software, ImageJ software 
(National Institutes of Health) and CellProfiler software“. Quantifica- 
tion of blood vessels or proliferating cells was performed by respec- 
tively counting CD31’ vessels or BrdU’ cells in a 250-pm-wide region 
of interest adjacent to the graft surface. Apoptotic or chondrogenic 
cells were quantified by respectively counting the number of TUNEL’ 
or SOX9* cells and the total number of cells ina 0.015 mm? region of 
interest near the graft surface atthe centre of thegraft. Quantification 
of cartilage was performed by outlining COL2' or safranin 0" areas 
within the total callusarea (For fractures and grafts) or the central graft 
callus area (half of total graft length). Quantification of woven bone was 
performed by outlining areas of macroscopically-defined immature 
bone within the total callus area. Quantification of FOXO3a" nuclei 
was performed using the ‘cell/particle counting and scoring’ pipeline 
in CellProfiler, in a region of interest encompassing the total callus 
area (for fractures) or the central graft callus area (half of total graft 
length). For all quantifications, measurements were made on at least 
three different sections throughout the sample. 


Computational model of bone-graft healing 

We used a previously established multiscale computational frame- 
work of boneregeneration that quantitatively describes the interplay 
between cells, growth factors, nutrient levels and blood vessels!" In 
short, this multiscale model combines ten partial differential equations 
of the taxis-reaction-diffusion type at the tissue level with a discrete 
agent-based approach at the vascular level, including eight intracel- 
lular variables for the endothelial cells. At the tissue level, the model 
accounts for the various key processes of intramembranousandendo- 
chondral ossification that occur during the soft and hard callus phase of 
bone healing. The partial differential equations describe the evolution 
intime and space of theskeletal progenitor cell density, fibroblast den- 
sity, chondrocyte density, osteoblast density, fibrous matrix density, 
cartilaginous matrix density, bone matrix density, osteochondrogenic 
growth factor concentration, vascular growth factor concentration 
and nutrient concentration. For simplification purposes, only one 
generic osteochondrogenic growth factor and one nutrient parameter 
is included in the computational model, which respectively repre 
sentthe effects of multiple growth factors (for example, transforming 
growth factors or bone morphogenetic proteins) and nutrients (such 
as oxygen, glucose, amino acids or lipids) present during bone heal- 
ing. The assumption is made that the net result of all growth factors 
present will be to promote chondrogenesis and osteogenesis, and 
thus if local levels of the osteochondrogenic growth factor reach a 
certain threshold (modelled using a sixth-order Hill Function) it will 
induce differentiation of skeletal progenitor cells. The decision on 
whether the end result of this differentiation event is chondrogenic 
or osteogenicismade by the nutrient parameter. The influence of the 
generic osteochondrogenic growth factor on skeletal progenitor cell 
differentiationis promoting chondrogenic differentiation when local 
nutrientlevelsare low, and promoting osteogenic differentiation when 
local nutrientlevelsare high. Cell types thatare considered at the tissue 
scale (skeletal progenitor cells,chondrocytes, osteoblasts, fibroblasts) 
can migrate (only skeletal progenitor cells and fibroblasts), prolifer- 
ate, differentiate and produce growth factors (generic osteochondro- 
genic growth factor or angiogenic growth factor) and extracellular 
matrix (cartilage, bone or fibrous tissue). Blood vessels are modelled 
at both acellular level (representing the developing vasculature with 
discrete endothelial cells) and an intracellular level (that defines the 
internal dynamics of every endothelial cell), and serve as thenutrient 
source. Atthe cellular level, the development of the discrete vascular 
tree (composed of endothelial cells) is determined by three different 
processes, that is, sprouting (the formation of anew branch, headed 


byatip endothelial cell), vascular growth (the extension of the branch 
due to tip cell migration) and anastomosis (the fusion of two branches). 
An anastomosis between blood vessels allows for blood flow and the 
delivery of nutrients. The intracellular level considers a number of 
molecular players that govern endothelial cell movement (VEGFR2, 
DLL4,Notch and actin). 

While the blood vessels are modelled discretely, continuous vari- 
ablesare used for nutrient density, bone density, cartilage density and 
fibrous tissue density (included in the model but not relevant for the 
current setupand thereforenotshown). Thecolourscale fornutrients, 
bone and cartilage thus indicates a continuous gradient going from 
complete absence of aparameter (‘0’ value; nutrients, bone or cartilage 
arenot present at that location) to completesaturation of aparameter 
(1 value; a location is completely filled with nutrients, bone or carti- 
lage). Allvaluesin between 0 and1 represent partial illingof alocation 
witha parameter. For the ‘tissue’ continuous variables (bone, cartilage, 
fibrous tissue), the sum of all tissues is 1, meaning that if alocation 
is completely filled with bone (value ‘1’), no cartilage can exist at the 
same location (value ‘0’). However, since the variables are continuous, 
aspecific location can contain both afraction of bone anda fraction of 
cartilage. Tissues, nutrients and blood vessels aremodelled in separate 
spaces and can thus‘co-exist’ in the same location. Since the nutrient 
parameteris also continuous, ithasan independent scalegoing fromno 
nutrients (value’0’) to saturating levels of nutrients (value‘T, which we 
defineas the level of nutrients found inside a modelled blood vessel). 

By adaptingthegeometry and boundary conditions tothe bone-graft 
setup, the influence of a filter placed in between graft and muscle on 
the healing process canbe predicted insilico. Detailed information on 
the equations, parameter values and implementation can be found in 
ref." Information on the boundary and initial conditions used in this, 
study can be foundin Extended Data Fig. 3. 


Isolation of primary cells 
Periosteal ells and trabecular osteoblasts were isolated from thelong 
bones of 8-10-week-old mice as described”. For the isolation of peri- 
osteal cells, femurs and tibias were dissected free of muscle and con- 
nective tissue under sterile conditions. Subsequently, the epiphyses 
were protected from digestion by submerging themin5%lowmelting 
point agarose (SeaPlaque, Lonza) and periosteal cells were isolated by 
enzymatic digestion using 3mg mI“ collagenaselll (Gibco) and4mgm!" 
dispase (Gibco) in a-minimal essential medium (a-MEM; Gibco) supple- 
mented with 1% penicillin/streptomycin (100 units mI “and 100 pgm!" 
respectively; Gibco). Cells from thefirst digest (10 min) were discarded 
as they contain cells from remaining muscleand connective tissue, and 
periosteal cells were obtained by asubsequent 1h digest. Thecells were 
passed througha70-ym nylon mesh (BD Falcon), washed twice and cul- 
tured in a- MEM with 1% penicillin/streptomycin and 10% FBS (HyClone) 
ina humidified incubator at 37 °C with 5% CO,. For the isolation of 
trabecular osteoblasts, femurs and tibias were cleaned thoroughly 
to remove muscle, connective tissue and periosteum. Subsequently, 
bones were incubated in collagenase-dispase (3 mg mI" collagenase 
Mand 4mgmt" dispasein a-MEM with1% penicillin/streptomycin) for 
20minto remove remaining periosteal cells. Next, epiphyseswerecut 
away, bone marrow was flushed out and the bone was cut into small 
pieces. Trabecular osteoblasts were isolated by incubating the bone 
fragments with collagenase-dispase for 30 min. Cells were passed 
through a 70-ym nylon mesh, washed twice and cultured in a-MEM 
supplemented with 1% penicillin/streptomycin and 10% FBS at 37 °C 
with 5% CO,. Cells from passage 2-3 were used for all experiments. 
Growth plate-derived chondrocytes and calvarial osteoblasts were 
isolated from 3-5-day-old mice as described'””, For murine growth 
plate-derived chondrocytes the resting zones of the growth plates 
from the distal femora and proximal tibiae were dissected free from 
surrounding tissue and pre-digested for 30 min with 1 mg ml” col- 
lagenase Il in a-MEM with 1% penicillin/streptomycin on ashaker at 
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room temperature. Cartilage fragments were then washed twice and 
subsequently digested for 3h in a2 mg mI“ collagenase Il solution in 
‘MEM with 1% penicillin/streptomycin ona shaker at 37 °C. The cell 
suspensionwas then filtered through a 40-umnylon mesh, washed and 
cultured in o-MEM supplemented with1% penicillin/streptomycin and 
10% FBS at 37°C with 5% CO;. Calvarial osteoblasts were prepared by 6 
sequential 15-min digestions of calvaria from 3-5-day-old mice inPBS 
containing1 mgmt collagenase Iland 2mg mI" dispase. Cellsisolated 
in fractions 2-6 were pooled and cultured in a-MEM supplemented 
with 1% penicillin/streptomycin and 10% FBS at 37 °Cwiths% CO,. Cells 
from passage 2-3 were used for all experiments. 

For isolation of rib chondrocytes, anterior rib cages were dissected 
from 5-day-old mice. Isolated rib cages were pre-digested ona shaker 
for30 minat room temperature with 1 mg m1" collagenasell (Gibco) 
dissolved in a-MEM supplemented with1% penicillin/streptomycin. 
Rib fragments were subsequently digested for 3hina2mgmt‘col- 
lagenase II solution in a-MEM with 1% penicillin/streptomycin ona 
shaker at 37°C. The obtained cell suspension of the second digest 
was filtered through a40-ym nylon meshand single cells were recov- 
ered by centrifugation. Cells were cultured in a humidified incuba- 
tor at 37 °C with 5% CO, in a-MEM supplemented with 1% penicillin/ 
streptomycin and 10% FBS. Cells from passage 2-3 were used for all 
experiments. 

Isolation of mouse skeletal stem cells was adapted from a previ- 
ously described protocol. Long bones of 3-5-day-old mice were 
dissected, muscle was cleared away carefully to preserve the perios- 
teum and bones were minced usinga scalpel. Bone fragments were 
then digested in a-MEM supplemented with 3 mgmI ‘collagenase Il, 
4.mg ml" dispase (both from Gibco) and 100 U ml DNase | (Sigma) 
at 37 °C for 3 sequential 15-min digests. Cell fractions were pooled 
and passed through a 70-m nylon mesh, washed with PBS contain- 
ing 2% FBS and stained with antibodies against CD45, TERI9, TIE2, 
CD105, CD90.2, CD249 (also known as 6C3) (BioLegend), CDS1(BD 
Pharmingen) and CD200 (eBioscience), and with the viability dye 
7-aminoactinomycin D (7AAD; BD Pharmingen). Immunophenotyp- 
ically-defined skeletal stem cells'* (7AAD"CD45-TERII9 TIE2- CD51" 
CD105-CD90.2-CD249 C200"; Extended Data Fig. 9a) were sorted 
on aBD FACSAria Il (BD Biosciences). Single colour controls were 
used to set compensations and fluorescence minus one controls 
were used to set gates. Sorted cells were cultured in a humidified 
incubator at 37 °C with 2% O; and 7.5% CO; in a-MEM supplemented 
with 1% penicillin/streptomycin and 10% FBS. For metabolic analyses, 
skeletal stem cells were grown in atmospheric O, levels with 5% CO, 
to enable direct comparison with other cell types. Cells from passage 
2-3 were used for all experiments. For flow cytometric analysis of 
culture-expanded skeletal stem cells, cells were gated again for the 
CDS1'CD105-CD90.2-CD249°CD200" population to limit analysis to 
the stem cell fraction. 

For the isolation of skeletal muscle-derived cell populations, 
hindlimb skeletal muscles, including quadriceps, soleus, gastrocne- 
miusand tibialis anterior, were dissected from 8-week-oldCAG-DsRed 
mice, minced using scalpel and digested in a-MEM medium supple- 
mented with 3 mg mt"'collagenaselll, 4mg ml" dispaseand 100 UmI™ 
DNase | at 37 °C for 60 min. Every 15 min, samples were pipetted up 
and down vigorously usinga 10-ml serological pipetteto break up tis- 
sue fragments. Cell suspensions were passed through a70-ym nylon 
mesh, washed with PBS containing 2% FBS and stained with antibodies 
against CD45, TERII9, CD31, F4/80 and CD146 (BioLegend), and with 
7AAD (BD Pharmingen). Immunophenotypically-defined macrophages 
(7AAD CD45‘F4/80'), endothelial cells (7AADCD45 TERII9 F4/80°C 
D31°CD146')and pericytes (7AAD CD45 TERI9 F4/80°CD3I-CD146') 
(Extended Data Fig. 9b) were sorted on aBD FACSAriall. Single colour 
controls were used to set compensations and fluorescence minus one 
controls were used to set gates. Sorted cells wereused for co-cultures 
with periosteal cells in micromasses. 


Cell ines 

TheC3HIOTI/2cellline, used asaskeletal progenitor cell model”, was 
obtained from the RIKEN Cell Bank and cultured ina humidified incuba- 
tor at 37 °C withS% CO, in a-MEM with 1% penicillin/streptomycin and 
10% FBS. Cells were routinely tested and found negative for mycoplasma 
contamination. 


Nutrient-deprivation assays 
Cells were seeded at 3,000 cells per cm?in basal DMEM (glucose- and 
glutamine-free; Gibco) supplemented with 1% penicillin/streptomycin, 
SmM-(+)-glucose (Sigma-Aldrich), 2mM L-glutamine (Gibco), 1mM 
sodium pyruvate (Gibco) and 10% dialysed FBS (HyClone). After 24h, 
cells were washed with PBS and switched to control medium (basal 
DMEM with 1% penicillin/streptomycin, S mM glucose, 2 mM t-glu- 
tamine, ImM sodium pyruvate and 10% dialysed FBS), SD medium 
(basal DMEM with 1% penicillin/streptomycin, 5 mM glucose, 2mM 
L-glutamine, 1 mM sodium pyruvate and 1% dialysed FBS), glucose- 
deprivation medium (basal DMEM with 1% penicillin/streptomycin, 
0.5mM glucose, 2mM L-glutamine, I mM sodium pyruvate and 10% 
dialysed FBS), glutamine-deprivation medium (basal DMEM with 1% 
penicillin/streptomycin, 5 mM glucose, 0.2 mM L-glutamine, 1mM 
sodium pyruvate and 10% dialysed FBS), CND medium (basal DMEM 
with1% penicillin/streptomycin, 0.5mM glucose, 0.2mML-glutamine, 
1mM sodium pyruvate and 1% dialysed FBS) or LRS medium (basal 
DMEM with 1% penicillin/streptomycin, S mM glucose, 2 mM t-glu- 
tamine, 1mM sodium pyruvate and 10% lipid-reduced FBS). LRS was 
made by mixing FBS with fumed silica (Sigma) at 20 mg mt" for 3h at 
room temperature, followed by centrifugation at 2,000g for 15 min 
and filtration of the supernatant through a 0.45-um-pore-size filter. 

Incertain experiments cultures were supplied with actinomycin 
D (transcription inhibitor; Sigma-Aldrich), cycloheximide (transla- 
tion inhibitor; Sigma-Aldrich), chloroquine (lysosomal inhibitor; 
Sigma-Aldrich) or etomoxir (CPTI inhibitor; Merck-Millipore) at the 
concentrations indicated in the text. For lipid rescue experiments, SD 
medium was supplemented with very low density lipoproteins (VLDL; 
Calbiochem) at a concentration of 607 pg triglycerides per ml FBS, 
palmiticor oleic acid (Sigma-Aldrich) at the indicated concentrations 
oramixture of PUFAs (10 pM linoleicacid, 15 uM o-linolenicacid, 10 1M 
arachidonic acid and 15 1M docosahexaenoic acid; all from Sigma- 
Aldrich). Triglycerides were incubated in FBS for 30 min at 37 °Cand 
fatty acids (dissolved in ethanol) were complexed to fatty acid-free 
bovine serum albumin (BSA) (Sigma-Aldrich) for 1h at 37 °C before 
adding to the culture medium, as described previously’. All supple- 
ments were added at the start of the experimentand were present for 
the entire duration of the cultures. 


Differentiation assays 

Toassess chondrogenic differentiation, 150,000 periosteal cells were 
resuspended in 10 pl of control medium and seeded as micromasses 
in the middle of a 24-well plate. Cells were allowed to attach for 1h 
at 37°C, after which 0.5 ml of control, SD or LRS medium containing 
10ng mI" recombinant human TGFBI (Peprotech), 50 1M L-ascorbic 
acid 2-sulfate (Sigma-Aldrich) and 20 iM Y-27632 (Rho kinase inhibi- 
tor; Axon Medchem)** was added to the wells. Medium was refreshed 
every other day and after 9 days micromasses were either stained with 
alcian blue or used for RNA isolation. For chondrogenic differentia- 
tion in the presence of muscle-derived cells micromasses were made 
using100,000 periosteal cells derived from Sox9-GFP miceand 50,000 
skeletal muscle-derived macrophages, endothelial cells, pericytes or 
unsorted cells obtained from CAG-DsRed mice. 

For chondrogenic differentiation in pellets 200,000 periosteal cells 
were placed in a 5-ml polystyrene tube in1 ml of control, SD or LRS 
medium containing 10ngml ‘recombinant human TGFBI (Peprotech) 
and 50 wML-ascorbic acid 2-sulfate (Sigma-Aldrich), supplemented with 


vehicle (1% ethanol in 4% fatty acid-free BSA in saline), 60 uM oleate or 
a mixture of PUFA (10 4M linoleic acid, 15 1M a-linolenic acid, 10 1M 
arachidonic acid and 15 1M docosahexaenoic acid) complexed to fatty 
acid-free BSA. Tubes were centrifuged for Sminat S0Ogand placed ina 
humidified incubator at 37 °C. Medium was changed every 3 daysand 
after21 days pellets were fixed in 4% paraformaldehyde for 10 min and 
processed for paraffin histological sectioning. 

For osteogenic differentiation, periosteal cells were seeded cells 
at 30,000 cells per cm in control medium and cultured for 3 days in 
order to reach full confluence. Cells were then switched to control, SD 
‘or LRS medium containing 50 uM Lascorbic acid 2-sulfate and 10 mM 
B-glycerophosphate (Sigma-Aldrich). After21 days, cellswereeitherstained 
withalizarin red Sto detectmineralization or used for RNA isolation. 


Metatarsal cultures 

Metatarsal rudiments were dissected from E16.5 Collal-cre/ERT2, 
DsRed embryos and stripped of skin. The middle three metatarsals 
were kept together as triads and cultured for 7 days on a Falcon insert 
membrane (pore size 0.4 1m) in 12-well plates in 1 ml of BGJb culture 
medium (Gibco) supplemented with 25 gml"L-ascorbicacid 2-sulfate, 
10 mM B-glycerophosphate, and FBS (10% or 1%)*°. When indicated a 
mixture of PUFAs (10 1M linoleic acid, 15 1M a-linolenic acid, 10 1M 
arachidonic acid and 15 1M docosahexaenoic acid) complexed to fatty 
acid-free BSA or vehicle (1% ethanol in 4% fatty acid-free BSA in saline) 
wasadded tothe culturemedium. Attheend of the cultures themetatar- 
sals were fixed overnight in 2% paraformaldehydein PBS and processed 
for histochemistry or immunohistochemistry. 


Flowcytometry 

Cell death was detected using annexin V-FITC and propidium iodide 
(Dead Cell Apoptosis Kit; Invitrogen), or using active caspase 3-FITC 
(FITC Active Caspase-3 Apoptosis Kit; BD Pharmingen). Proliferation 
was assessed by staining with a PE-conjugated mouse anti-Ki-67 anti- 
body (BD Pharmingen; #556027; 1:10) and Hoechst 33342 (40 wgml"; 
Invitrogen) after fixation and permeabilization of the cells (BD Cytofix/ 
Cytoperm Kit, BD Biosciences). Intracellular SOX9 levels were quanti- 
fied by staining with an Alexa Fluor 647-conjugated rabbit anti-SOX9 
antibody (Cell Signaling Technology; no. 71273; 1:100) after fixation and 
permeabilization of the cells. Gating for SOx9"*" cells was set to have 
approximately 10% SOX9"*" cells in control conditions. Single colour 
controls were used to set compensations and fluorescence minus one 
controls were used to set gates. 


Immunocytochemistry 

Forimmunofluorescence microscopy, cells grown on coverslips were 
fixed with 4% paraformaldehyde, permeabilized with 0.5% Triton-X100 
inPBS and blocked with PBS containing 5% BSA, 5% normal goat serum 
and 0.5% Tween-20, Next, cells were incubated overnight at 4 °C with 
primary antibodies (rabbit anti-FOXO1, Cell Signaling Technology, 
no. 2880, 1:100; rabbit anti-FOXO3a, Cell Signaling Technology, no. 
2497, 1:100) in blocking buffer, followed by three washes with PBS/ 
‘Tween-20. Slides were subsequently incubated for 2hwith secondary 
antibodies (Alexa Fluor 488-conjugated goat anti-rabbit; 1:500) in PBS 
containing 5% BSA and 0.5% Tween-20, washed and counterstained 
with Hoechst 33342. Stainings omitting the primary antibody were 
used as negative controls. 

For staining of lipid droplets with 1,6-diphenyl-1,3,5-hexatriene 
(DPH), cells grown on coverslips were washed with PBS and fixed with 
3.7% formaldehyde in PBS. DPH staining solution was prepared by 
diluting a2 mM DPH (Sigma-Aldrich) stock (in DMSO) inPBS toa final 
concentration of 4 uMas previously described”. Cells were stained 
with DPH for 30 min, washed and nuclei were counterstained using 
TO-PRO-3 (Molecular Probes). 

For tracking lipid movement between lipid droplets and mitochon- 
dria, cells were incubated with the fluorescent fatty acid analogue 


BODIPY 558/568 C12 (Red-C12; Invitrogen) atl uM in culturemedium for 
16 h(ref.*). Cells were then washed three times with culture medium, 
incubated for 1h in culture medium to allow the fluorescent lipids to 
incorporateinto lipid droplets, and then chased for thetimeindicated 
in control or SD medium. Mitochondria were labelled with 100 nM 
MitoTracker Deep Red FM (Invitrogen) for 30 min before the end of 
the experiment. Cells were fixed and lipid droplets were stained with 
DPHas described above. 

Formeasurement of autophagic flux, cells grown on coverslips were 
transfected with 1 pg of an RFP-GFP-LC3 tandem construct using the 
X-tremeGENE HP transfection reagent (Roche) according to the manu- 
facturer’sinstructions. After 24 h, cells were washed with PBS andused 
for subsequent experiments. Since the GFP-LC3 loses fluorescence 
owing to lysosomal acidic and degradative conditions but the RFP-LC3 
does not, autophagosomesinthecellare seen as green-yellow puncta, 
whereas autophagolysosomesare red. 

Images were taken ona Zeiss LSMS10-META NLO multi-photon confo- 
cal microscope or Zeiss LSM880 confocal laser scanning microscope, 
and prepared using Adobe Photoshop CSS (Adobe) and ImageJ. LC3 
puncta and DPH' lipid droplets per cell were counted manually in 
Image}, while overlap between MitoTracker and Red-C12 in manually 
delineated cells was performed using the ‘co-localization’ plugin for 
ImageJ after thresholding of individual frames. 


Western blot analysis 

Total celllysates were obtained by lysing cells in25 mM Tris-HCI buffer 
(pH17.6) containing 150 mM NaCl, 1% NP-40, 1% sodium deoxycholate, 
0.1% SDS, 1x cOmplete protease inhibitor cocktail (Roche) and 1x 
PhosSTOP phosphatase inhibitor cocktail (Roche). For cytoplasmic 
and nuclear extracts, cells were first lysed in 20 mM Hepes (pH 7.9) 
containing 10 mM KCI, 1.5 mM MgCl, 1 mMEDTA, 0.5% NP40,1mM 
DTT,1mMNa,VO,,20 mM NaF,1mM PMSF, 5 ygml" aprotinin, Sg ml" 
leupeptin and 0.33 yg ml" antipain. Following 15 min incubation at 
4°C, the cell lysates were passed 10 times through a26 gaugeneedle. 
After centrifugation for 1 min at18,000g, the supernatant (cytoplasmic 
proteins) was removed and the pellet containing the nuclear protein 
fraction was resuspended in 50 mM Hepes (pH7.9) containing S00 mM 
NaCl, 1%NP40,5pgmt" aprotinin, 5 pgm" leupeptinand 0.33 ngmt™ 
antipain, and sonicated. Proteins (10 ig, except for detection of LC3 for 
which 20 j1g was used) were separated by SDS-PAGE and transferred to 
anitrocellulose membrane (GE Healthcare). Membranes were blocked 
with 5% dry milk in Tris-buffered saline with 0.1% Tween-20 for 30 min 
at room temperature and incubated overnight at 4 °C with primary 
antibodies (rabbitanti-SOX9, Novus Biologicals, NBPI-85551, 1:2,000; 
rabbit anti-FOXOI, Cell Signaling Technology, no. 2880, 1:1,000; rab- 
bit anti-FOXO3a, Cell Signaling Technology, no. 2497, 1:1,000; rabbit 
anti-LC3B, Cell Signaling Technology, no. 3868, 1:500; mouse anti- 
G-actin, Sigma, A541, 1:10,000; mouse anti-lamin A/C, Santa Cruz 
Biotechnology, sc-376248, 1:5,000) dilutedin blocking buffer. Signals 
were detected by enhanced chemiluminescence (Perkin Elmer) after 
incubation with HRP-conjugated secondary antibodies (DAKO). For 
gel source data, see Supplementary Fig. 1. 


Metabolic assays 

Glucose and lactate levels in culture medium were measured on a 
‘AU640 Chemistry Analyzer (Beckman Coulter). Glucose consump- 
tionwas calculated by subtracting the remaining amount of glucose in 
theculture medium after 24 h of incubation with cells from the amount 
of glucose in unspent medium, and normalized for time and for cell 
number via DNA quantification. Ina similar way, lactate secretion was 
calculated by subtracting lactate levels in unspent medium from the 
levels in medium incubated for 24 h with cells. Oxygen consumption 
was determined on a Seahorse XF24 Analyzer (Seahorse Bioscience) 
using 50,000 cells per well. The assay medium was unbuffered DMEM 
(Sigma) supplemented with 5 mM p-glucose and 2 mM L-glutamine, 
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pH7.4. For quantification of FAO-linked oxygen consumption the dif- 
ference in OCR before and after injection of etomoxir (100 1M final 
concentration) was calculated”. 

For measurement of glycolysis, cellswere incubated for 6hin growth 
medium containing 0.3 Ci ml” [5°H]D-glucose (PerkinElmer). The 
culture medium was then transferred into glass vials sealed with rub- 
ber caps. "H,0 was captured in hanging wells containing a Whatman 
paper soaked with H,0 over a period of 48h at 37 °C to reach satura- 
tion”. Radioactivity was determined in the paper by liquid scintillation 
counting and values were normalized to DNA content. 

For glucose oxidation, cells wereincubated for 6 hin growth medium 
containing 0.6 Ci mt" [6-"C]p-glucose (PerkinElmer). To stop cellular 
metabolism, 250 pl ofa2M perchloricacidsolution was addedand wells 
were covered witha Whatman paper soaked with Ixhyamine hydroxide. 
CO, released during the oxidation of glucose was absorbed into the 
paper overnight at room temperature. Radioactivity in the paper was, 
determined by liquid scintillation counting, and values were normal- 
ized to DNA content”. 

FAO was measured after incubation of the cells with 3 1Ci mt 
[9,10-'H]palmitate (PerkinElmer), complexed to BSA, for 2h. Then, 
the culture medium was transferred into glass vials sealed with rub- 
ber caps. "H,0 was captured in hanging wells containing a Whatman 
paper soaked with H,0 over a period of 48h at 37 °C. Radioactivity in 
the paper was determined by liquid scintillation counting, and values 
were normalized to DNA content®. 


Metabolite diffusion assay 

Diffusion rates were measured in custom-designed diffusion chambers 
according to apreviously established protocol”. Chambers were fab- 
ricated in a polydimethylsiloxane (PDMS) device on aglass substrate 
with medium reservoirs that contained fluorescent tracer molecules. 
2-NBDG (342 Da) and BODIPY FL C16 (FL-C16; Invitrogen) complexed 
to fatty acid-free BSA (66.5kDa) were usedas fluorescentanaloguesto 
evaluate the diffusion rates of glucose and fatty acids, respectively, in 
separate runs. Tracer movement was assessed in square borosilicate 
glass capillaries with an inner width of 0.8 mm and wall thickness of 
0.16 mm (VitroCom). Collagen type gels (5 mgm!) containing peri- 
osteal cells (5 million per ml) were polymerized within the capillaries, 
after which the capillaries were connected to the PDMS reservoirs which 
initiated the diffusion process resulting froma concentration gradient 
between the tracer saturated medium reservoir (250 1M2-NBDG or 
25 UM FL-C16 complexed to 25 tM BSA) and the tracer-free capillary. 
Tracer gradients within the capillaries were imaged ona confocal fluo- 
rescence laser scanning microscope (FV1000, Olympus) equippedwith 
a UPLSAPO 10x air objective (NA 0.40) focused on the middle plane 
of the collagen gel. Focus drift was compensated using an IX81-ZDC 
module that focuses a785-nm laser on the glass capillary surface to 
stably reproduce the focus position for each capillary position and for 
every acquisition time point. Images were acquired asa time serieswith 
10-minintervals overa total period of Sh, at37 °C. Tracer-free collagen 
gels were visualized to correct theimagesequences forany background 
fluorescence intensity. A tracer saturated collagen gel was visualized 
during each diffusion experiment to compensate for potential pho- 
tobleaching of tracer and to normalize the gradient profiles for further 
processing, Image sequences were processedin Image). Diffusion rates 
were obtained by least squares fitting an analytical solution of Fick’s 
second diffusion law to the resulting averaged axial intensity profiles 
in MATLAB (MathWorks). 


Gene targeting 

Tosilence Sox9, Cptia, Atgs, Foxol or Foxo3a, wetransduced cellsin the 
presence of 8 ugml" polybrene (Sigma-Aldrich), with alentiviruscarry- 
ingashRNA against SOX9[S51] (Addgene plasmid repository no. 40645; 
multiplicity of infection (MO1) SO), CPT1a (MISSION, Sigma-Aldrich; 
MOI25) or ATGS (MISSION, Sigma-Aldrich; MO125), or concomitantly 


with shRNAs against Foxol and Foxo3a (MISSION, Sigma-Aldrich; each 
atMOI25). To overexpress SOX9 we transduced cells, inthe presenceof 
8pgml"polybrene, with alentivirus carrying aSOX9-overexpression 
plasmid[51] (Addgene plasmid repository no. 36979; MOI150).Anon- 
sense scrambled (Scr) shRNA sequence or empty vector was used as 
anegative control. After 24 h, virus-containing medium was changed 
to normal culture medium and 48 h later, cells were used for further 
experiments. Target knockdown was confirmed by western blot. 

Tosilence expression of Foxo genes using CRISPR-Cas9, we trans- 
duced Cas9-expressing C3HI0T1/2 cells (Cas9: Addgene plasmid 
repository no. 48139)®, witha lentivirus carrying doxycycline-induc- 
ible sgRNAs against Foxol (GenBank accession number NM_019739) 
(S-TTGTAAAGGTGTCTTCACGGGGG-3’) and Foxo3a (GenBank acces- 
sion number NM_019740) (5/-CATTCTGAACGCGCATGAAGCGG-3') 
(doxycycline-inducible plasmid: Addgene plasmid repository no. 
70183)". Cells were cultured in thepresence of doxycycline (250ngmt") 
for 72hbefore experiments. 


Quantification of active FOXO levels 

Levels of active FOXO were measured using the TransAM FKHR (FOXO1) 
DNA-binding ELISA (Active Motif) on nuclear protein extracts, and 
normalized to total nuclear protein input as measured by bicinchoninic 
acid assay (Pierce BCA Protein Assay Kit; Thermo Scientific). 


Total RNA extraction and RT-qPCR analysis 

Total RNA from cultured cells was extracted using the RNeasy Mini 
Kit (Qiagen). Total RNA from cortical bone (femurs of eight-week- 
old mice, cleaned and flushed to remove bone marrow) and cartilage 
(growth plates dissected from the distal femur and proximal tibia 
of three-day-old pups) was extracted using TRIzol (Invitrogen) fol- 
lowed by RNA clean-up using the RNeasy Mini Kit. mRNA was reverse 
transcribed using Superscript II Reverse Transcriptase (Invitrogen). 
Reverse transcription with quantitative PCR (RT-qPCR) was performed 
on the 7500 Fast Real-Time PCR System (Applied Biosystems). Spe- 
cific forward and reverse oligonucleotide primers were used either 
in conjunction with SYBR Green dye (Cptla, Acadm, Acad! and Myod 
{also known as Myod!)) or with FAM-TAMRA conjugated probes (all 
others). The following primers and probes were used: Sox9 (GenBank 
accession number NM_011448): 5’-TCTGGAGGCTGCTGAACGA-3 
(forward), 5’-TCCGTTCTTCACCGACTTCCT-3’ (reverse), 5’-FA 
M-CAGCACAAGAAAGACCACCC-TAMRA-3' (probe); Col2al (GenBank 
accession number NM_031163): 5’-AGAACATCACCTACCACTGTA 
AGAACA-3’ (Forward), 5’-TGACGGTCTTGCCCCACTT-3’ (reverse), 
5/-FAM-CCTTGCTCATCCAGGGCTCCAATG-TAMRA-3' (probe); Acan 
(GenBank accession number NM_001361500): 5’-GCATGAGAGA 
GGCGAATGGA-3' (forward), 5-CTGATCTCGTAGCGATCTTTCTTCT-3" 
(reverse), 5/-FAM-CTGCAATTACCAGCTGCCCTTCACGT-TAMRA 3° 
(probe); Runx2 (GenBank accession number NM_001146038): 
5/-TACCAGCCACCGAGACCAA-3’ (forward), 5’-AGAGGCTGTT 
TGACGCCATAG-3’ (reverse), 5’-FAM-CTTGTGCCCTCTGTT 
GTAAATACTGCTTGCA-TAMRA-3’ (probe); Ocn (GenBank accession 
number NM_007541): 5/-GGCCCTGAGTCTGACAAAGC:3' (forward), 
5/-GCTCGTCACAAGCAGGGTTAA-3’ (reverse), 5’-FAM-ACAGACTCC 
GGCGCTACCTTGGAGC-TAMRA-3' (probe); Pparg (GenBank acces- 
sion number NM_001127330):5’- CCCAATGGTTGCTGATTACAAA-3" 
(forward), 5’-AATAATAAGGTGGAGATGCAGGTTCT-3’ (reverse), 5’-FA 
M- CTGAAGCTCCAAGAATACCAAAGTGCGATC-TAMRA-3’ (probe); 
Myod (GenBank accession number NM_010866): 5’-GCGCGAGTCCA 
GGCCAGG-3' (forward), 5’-CGACTCTGGTGGTGCATCTGC-3 (reverse); 
Sle2at (GenBank accession number NM_011400):5’-GGGCATGTGCTT 
CCAGTATGT-3’ (forward), 5’-ACGAGGAGCACCGTGAAGAT-3' (reverse), 
5/-FAM-CAACTGTGCGGCCCCTACGTCTTC-TAMRA 3’ (probe); PfKfb3 
(GenBank accession number NM_001177757): Mm.PT.51.16600796 
(Integrated DNA Technologies); Ldha (GenBank accession num- 
ber NM_010699): 5’-TTCATCATTCCCAACAT TGTCAA-3' (Forward), 


5/CACTGATTTTCCAAGCCACGTA-3’ (reverse), 5’-FAM-AGTCCACAC 
TGCAAGCTGCTGATCGTC-TAMRA-3’ (probe); Cptia (GenBank acces- 
sion numberNM_013495):5’-GCCCATGTTGTACAGCTTCC-3’ (forward), 
S-TTGGAAGTCTCCCTCCTTCA-3' (reverse); Acadm (GenBank accession 
number NM_007382): 5’-TTTCGAAGACGTCAGAGTGC-3’ (forward), 
S/-TGCGACTGTAGGTCTGGTTC-3’ (reverse); Acad! (GenBankaccession 
number NM_007381):5’-TCTTTTCCTCGGAGCATGACA-3 (forward), 
5-GACCTCICTACTCACTTCTCCAG-3 (reverse). Expression levels were 
analysed using the 2°! method and were normalized for the expres- 
sion of the housekeeping gene Actb. 


RNA-seq, gene expression quantification and enrichment 
analysis of transcription binding motifs 

In brief, total RNA was extracted from C3H1OTI/2 cells seeded in six- 
well plates using TRIzol. Polyadenylated RNA enrichment, reverse 
transcription and stranded library preparation were done using the 
KAPA stranded mRNA-seq kit (Roche). The first 5O bases of these 
libraries were sequenced on a HiSeq4000 (Illumina) and mapped to 
the murine genome (build mm1O) using TopHatv.2.1.1*, Read counts 
were processed using Edger v.3.20.9* to identify genes differentially 
expressed between cells that were serum-starved (1% FBS) and cells 
that were control-treated (10% FBS). The top-100 most-significantly 
upregulated genes upon serum starvation (ata 1% false discovery rate, 
differential expression in EdgeR is assessed for each gene usingan exact 
test analogous to Fisher's exact test, but adapted for overdispersed 
data‘) were analysed for motif enrichment using i-cisTarget”*. 


Single-cell RNA-seq- of mouse long bone 

The single-cell RNA-seq dataset of the mouse long bone and bone mar- 
row stroma was generated previously and detailed information on cell 
isolation, cell sorting, library preparation, RNA-seqand data process 
ingis provided in the original manuscript”. A set of 40 genes involved 
in FAO and 34 genes involved in glycolysis was curated from the Gene 
Ontology database (http://software.broadinstitute.org/gsea/msigdb) 
and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database 
(http://www.genome.jp/kegg).Gene expression was calculated as the 
fraction ofits uniquemolecularidentifier (UMI; random barcode) count 
with respect tototal UMLin the cell and then multiplied by 10,000. We 
denoted itas transcripts per 10,000 transcripts (TPIOK). 


ChIP-qPCR 

ChiP-qPCR was performed as described™. In brief, 3 hafter serum 
deprivation, C3HIOT1/2cells were fixed using1% formaldehyde, washed 
and collected by centrifugation (1,000g or 5 min at 4 °C). The pellet 
was resuspended in RIPA buffer (50 mM Tris-HCl pH 8, 150 mM NaCl, 
2mMEDTA, 1% Triton-X100, 0.5% sodium deoxycholate, 1%SDS and 1% 
protease inhibitors), homogenized, incubated on ice for 10 min and 
sonicated. The samples were centrifuged (16,000g for 10 min at 4 °C) 
and from thesupernatant sheared chromatin was used as input (1/30), 
and on the remainder of the chromatin immunoprecipitation was 
performed with an anti-FOXO1 antibody (rabbit anti-FOXO1, Abcam, 
ab39670) or an anti-FOXO3a antibody (rabbit anti-FOXO3a, Abcam, 
ab12162). After precipitation using Pierce Protein A/G Magnetic Beads 
(Thermo Fisher Scientific), followed by RNAand protein digestion, DNA 
was purified using Agencourt AMPure XP (Beckman Coulter) accord- 
ing to the manufacturer's instructions. RT-qPCR was performed using 
SYBRGreenER qPCR SuperMix Universal (Thermo Fisher Scientific) and 
specific primers for the Sox9 promoter region (5’-TGTGGGCATATTG- 
GCTTCT-3' (Forward), 5'-GGTTAAACTGGGAAGACTCATGG:3 (reverse)). 


Statistical analysis 

Allnumerical results are reported as mean + s.e.m, Statistical signifi- 
cance of the difference between experimental groups was analysed 
by two-tailed Student's r-test, one-way, two-way or three-way ANOVA 
with Bonferroni post hoc test (as indicated in the figure legends and 


source data files) using the GraphPad Prism software. Differences were 
considered statistically significant for P< 0.05. In thestudies performed 
incell inesin culture, all experiments were independently repeated at 
least three times. Experiments using primary cellswere performed with 
atleast three biological replicates. Western blots were independently 
repeated atleast twice. Mice for experiments were randomly allocated 
to groups. All numerical values used for graphs and detailed statistical 
analysis can be found in the source data files. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 


The bulk RNA-seq data that support the findings of this study have been 
deposited in ArrayExpress with the accession number E-MTAB-7564. 
The single-cell RNA-seq data were generated previously” and are 
deposited in the Gene Expression Omnibus with accession number 
GSE128423. A portal for exploring the entireatlasis available athttps:// 
portals.broadinstitute.org/single_cell/study/mouse-bone-marrow- 
stroma-in-homeostasis. Source Data for Figs. 1-4 and Extended Data 
Figs. 1-8 are provided with the paper. All other data supporting the 
findings of this study are available within the paper. 


Code availability 


The full code used forthe computational model ofbone-grafthealing 
isavailable from the authors upon request. Morebackgroundinforma- 
tion on the development of the model can be found in our previous 
publications!" 
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Extended Data Fig. 1|Removal of periosteum reduces bone formation and 
callus vascularization. a, Histological characterization of the mouse bone- 
autograft healingmodel. At the host-graftjunction cartilage (safranin O')is 
formed at PFD7. Note absence of CD31" blood vesselsin these regions. Near the 
graftcentre new woven bone (bright pink on H&E staining) is deposited, 
cartilage isabsent and blood vessels are abundant, By PFD14, the cartilage at 
the host-graftjunctionis gradually being replaced by bone, while thewoven 
bone near the graft centre appears mature (representative images of four 
mice),Scale bars, 200 umn host-graft junctionimages,100 min graft-centre 
images, 50 ym in magnifications. b, MicroCT-based visualization and 
quantification of newly formed bone around control autografts, autografts 
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from which the periosteum wasremoved or devitalized allografts (no living 
cells) at PFD28 (n= 3mice). Coverage ratio represents percentage of graft 
surface covered by new bone.¢, Dual-energy microCT-based visualization and 
quantification of vascularization ina 250-pm-wide region around autografts 
andallograftsat PEDI4 (n=5 mice for autograft, n=6 mice for devitalized 
allograft). d, CD31 immunohistochemical visualization and quantification of 
vascularization ina 250 wm-wide region around autografts and allografts at 
PEDI4 (n=3 mice).Scale bars, S00 um.b, bone; c, cartilage: ft, fibrous tissue; 
g, graft;h, host; m, muscle; p, periosteum. Data are mean +.e.m.;one-way 
ANOVAWwith Bonferroni post hoc test (b), two-tailed Student's t-test (c,d). 
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Extended Data Fig.2|Reducing vascularization alters but doesnot prevent 
bonehealing. a, Histological visualizationand quantification of apoptotic 
cells (TUNEL’; n= 4 mice for control, n=5 mice for filter 0.2) inthe callusof 
grafts with or withouta filter (0.2 um pore size) at PFD7. Scale bars, O ym 

b, Histological visualizationand quantification of proliferating (BrdU 
mice) cellsin the callus of grafts with or without. filter (0.2 ym pore size) at 
PED7.Scale bars, 100 um. ¢, MicroCT-based visualization and quantification of 
newly formed bone around control graftsor grafts surroundedby a filter 

(0.2 jum pore size) at PFDI4 (n=4 mice for control, n=6 mice for filter 0.2). 
Coverage ratio represents percentage of graft surface covered by newbone. 
Cell tracing of donor periosteal cells during healing of bone grafts, derived 
fromCAG-eGFP mice, with or without filter (0.2 1m pore size) at PFDI4 showing, 
equal contribution of donor cells to cartilage in both conditions, but reduced 
contribution of donor cells to bone near the graft ends. Arrows, GFP 
osteoblasts; arrowheads, GFP" osteocytes; representative images of three 
mice. Scale bars, 50 im. €, MictoCT-based visualization and quantification of 
newly formed bone around control graftsor grafts surrounded bya filter 

(0.2 jum pore size) at PFD28 (n=3mice). f, Histological analysis of autografts 
with or without filter (0.2 um poresize) at PFD28 showing comparable callus 
‘morphologyand composition, although remaining cartilage islands (detail 
image) were seen whena filter was present but notin the callus of control grafts 
(representative images of three mice). Scale bars, 500 wm. Dataare 

mean s.e.m.;two-tailedStudent’srtest. 
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Extended DataFig.3| Insilico modelling supportsarole for nutritional 
stressinchondrogeniccommitment. Application ofa previously described 
computational model of bone repair™"to the bone-graft healing setup. Inthis 
‘model, the behaviour (survival, proliferation, differentiationand tissue 
formation) of skeletal progenitor cells, chondrocytes, osteoblastsand 
fibroblastsis dependent on the local supply of nutrientsby blood vessels, in 
addition to the presence of growth factors, extracellular matrix and the cell 
density.a, Schematic overview (top) of the modelled region shown ingreen. 
The hatched area represents thegraft callus. At the start of the simulation the 
‘modelled region was filled with loose fibrous tissue matrix, growth factors, 
stem cells, osteoblasts, fibroblasts and nutrients, representingthe fracture 
haematoma. Overview of the Dirichlet boundary conditions (bottom) showing, 
the starting points of blood vesselsand the sites of release of cells and growth 
factors (and nutrients for the condition with filter) during the healing process. 
b, Application of the model to the normal bone graft (thatis, blood vesselscan 
come from themuscle side). Heat map-based visualization of blood vessel, 
nutrient, cartilage and bone distribution in the modelled region at different 
time points shows that the model correctly predicts the spatiotemporal 
progression of the bone-healing process. Nutrientsand tissue fractionsare 
expressed on anon-dimensional scale ranging from 0 (absence) to1 
(saturation).c,d, Application of the model to bone-graft healingin the 
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presence ofafilter placed inbetween graftand muscle (thatis, blood vessels 
cannot come from the muscle side) with visual representation (¢) and 
quantification (d) ofthe different tissue fractions inthe modelled region. 
Quantification was performed only in the leftrectangle of the modelled region, 
asindicatedby the hatched areaina, representing the graft callus. The amount 
of nutrients that can pass through the filter (the boundary condition (BC) was 
varied between 100% (the maximum amount that can be supplied by the 
vasculature, applied to the whole filter length, resulting in similar nutrient 
distributions sin the control) and 0%. When nutrientsupply through the filter 
issetat20-40%, the model correctly recapitulates the chondrogenic switch in 
the central region of the graftas observed in vivo, When nutrient supply 
through the filter was >40%, the cellsin the central graft regiondifferentiated 
directly into osteoblasts, anda supply of nutrients <20%induced massive cell 
death and completely prevented tissue formation and grafthealing.e, Visual 
representation of the effect of additional growth factor (gf) diffusion and/or 
progenitor cell (prog) migration from the filter side on cartilageand bone 
fractions at day 14, The control situation (no filter) is shown on the topand the 
filter situation with a boundary condition or nutrients of 40%is shown onthe 
bottom, No large effect of these additional boundary conditions on the healing 
response was observed. 
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Extended DataFig. 4 See next page for caption. 
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Extended Data Fig. 4 Skeletal progenitorsresist nutritional stress via 
induction of SOX9.a, Immunoblot detection of nuclear SOX9 inC3HI0T1/2 
cells and periosteal cells exposed for 24 h to control or CND medium, with 
lamin A/Cas loading control (n=2independent experiments).b, mRNA levels 
of Sox9and Col2azin periosteal cells exposed for the indicated times to control 
orCND medium (relative to control; =3 biologically independent samples). 

¢, mRNA levels of runtrelated transcription factor 2 (Runx2; osteogenic 
lineage), peroxisome proliferator-activated receptor y (Pparg:adipogenic 
lineage) and Myod (myogenic lineage) in periosteal cells exposed for 48hto 
control or CND medium (relative to control; n=3 biologically independent 
samples). d, mRNA levels of S0x9 in C3HIOTI/2cells exposed fortheindicated 
times to control or SD medium (relative to control, n=3independent 
experiments). e, Immunoblot detection of total SOX9 in C3HI0T1/2.cells, 
exposed for different durations to control or SD medium, with B-actinas 
loadingcontrol (n=2independent experiments).f, Immunoblotdetection of 
nuclearand cytoplasmic SOX9 in C3H10T1/2cells exposed for 6 hto controlor 
‘SD medium, with lamin A/C or -actin as loading control (n=2independent 
experiments). g, Immunoblot detection of SOX9 intotalcell protein extracts of 
C3HIOT1/2cells exposed for 6hto control medium, SD medium or SD medium 
supplemented with different concentrations of the transcription inhibitor 
actinomycin D (Act. D) or the translation inhibitor cycloheximide (CHX). 
Detection of f-actin was used as loading control (n=2independent 


experiments). h, MRNA levels of Runx2, Ppargand Myodin C3H10T1/2 cells, 
exposed for the indicated times to control or SD medium (relative to control, 
n= 3independent experiments).i, Immunoblot detection of nuclear SOX9 in 
periosteal cells exposed for 24 hto control or SD medium with lamin A/Cas 
loading control (n=3 biologically independent samples).j, Osteogenic 
differentiation of periosteal cells in control or SD medium, assessed by 
visualization of mineral deposits (alizarin red staining) and quantification of 
‘Ocn mRNA levels (relative to Actb,n=3 biologically independentsamples).. 

k, Immunoblot detection of SOX9 in total cell protein extracts of C3HIOTI/2 
cells (in control or SD medium), periosteal cells and growth plate-derived 
chondrocytes transduced with shSox9 or shScr, with B-actin as loading control. 
Alonger exposure time was used for SOX9 detection in C3HIOTI/2cellsand 
periosteal cells compared with chondrocytes in order to visualize any 
remaining proteinin he shSOX9 conditions (n=2independent experiments 
for C3HIOT1/2cells, n=3 biologically independent samples for periosteal cells, 
growth plate-derived chondrocytes).1, Quantification of cell viability of 
C3HIOTI/2cells, periosteal cells and growth plate-derived chondrocytes 
transduced with shSox9 or shSCR, after 72 h of exposure to control, SD or CND 
medium (n=3independent experiments for C3HI0T1/2cells, n=3 biologically 
independentsamples for periosteal cells, growth plate-derived chondrocytes). 
Dataaremeans.e.m.;two-way ANOVA with Bonferroni posthoc test (b,d.h, 1), 
two-tailed Student’sctest (c,j). For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 5| See next pagefor caption. 


Extended Data Fig. 5| Reduced lipid availability favours chondrogenesis 
over osteogenesis.a-c, Immunoblot detection of total SOX9 in C3HIOT1/2 
cells exposed for 6 h to control medium, SD medium or SD medium 
supplemented with increasing concentrations of palmitate (a), VLDL (b) or 
PUFA (c). Detection of B-actin was used as loading control. EtOH was used asa 
vehicle control inaand ¢ (n=2independentexperiments). d, Histological 
visualization (by immunofluorescence for COL2) of chondrogenic 
differentiation of periosteal cellsin pellet culturesin control, SD or LRS 
‘medium supplemented with vehicle (EtOH), oleate or PUFA (representative 
imagesofn=2independentexperiments).Scale bars, 100 um.e, Osteogenic 
differentiation of periosteal cellsin control, SD or LRSmedium, assessed by 
visualization of mineral deposits (alizarin red staining) and quantification of 
Ocn mRNA levels (relative to Actb, n=3 biologically independent samples). 
£,Flow cytometric detection and quantification of the percentage of SOX" 
cells and totalSOX9 levels in C3HI0T1/2 cells, periosteal cellsand skeletal stem 
cells exposed for 24 hto control, SD or LRS medium (n=4 independent 
experiments for C3H10T1/2 cells, n=4 biologically independent samples for 
periosteal cells, skeletal stem cells). Gating for SOX9""cells was set to have 
approximately 10% SOX9"*"cellsin control conditions in each cell type. 

g,h, Flow cytometric quantification of cell cycle (g)and apoptosis (h) in 
SOX9"" and SOX9"*" subpopulations of C3HI0TI/2 cells, periosteal cellsand 
skeletal stem cells exposed for 24 h to control, SD or LRS medium (n=3 
independent experiments for C3HIOTI/2cells, n=3 biologically independent 
samples for periosteal cells, skeletal stem cells).i, Histological visualization 
and quantification of early chondrogenic (SOX9") and osteogenic (Collal- 
DsRed') cells in metatarsals cultured for one week in contro! medium, SD 
‘medium or SD medium supplemented with PUFA or vehicle (EtOH) (n=6 


biologically independentsamplesfor control, SDand SD+ vehicle, n=7 
biologically independentsamplesfor SD + PUFA).Scale bars, 50 ym. 

J, Histological visualization of mineralization by Von Kossastaining in 
‘metatarsalscultured for one week in control medium, SD medium or SD 
medium supplemented with vehicle or PUFA (representative images of n= 
biologically independentsamplesfor control, SDandSD+ vehicle, n=7 
biologically independentsamplesfor SD + PUFA).Scale bars, 100 um. 

k, Histological visualization (safranin O staining) and quantification of 
cartilage and woven bone in the callus at PFD7 of mice treated daily with free 
fatty acids (FFA; 20 ulcornoil) or sham injection (saline) atthe fracture site 
(n=Smice).Scalebars,500 ym. 1, Flow cytometric quantification of total SOX9 
levels in C3H10T1/2.cells or skeletal stem cells exposed for 24h to control, SD or 
LRS medium supplemented with 100 xM GW9S08 or vehicle (DMSO) ( 
independentexperiments for C3H10T1/2 cells, n =3 biologically independent 
samples forskeletalstem cells). m, Visualizationand quantification of 
diffusion of afluorescent fatty acid (FL-C16) and fluorescent glucose (2-NBDG) 
in collagen gels seeded with periosteal ells (510° per ml) (n=3 biologically 
independentsamples for FL-Cl6, n= biologically independent samples for 
2-NBDG).Scalebars, 500 um.n, 0, Visualizationofalcian blue staining (n) and 
visualization and quantification of Sox9 expression (0) in micromass co- 
cultures of periosteal cells from Sox9-GFP mice and sorted cell populations 
from skeletal muscle of CAG-DsRed mice, after nine daysin chondrogenic SD 
medium (n=4 biologically independent samples). Addition of oleate was used 
as positive control. Scale bars, 100 ym. EC, endothelial cell, M&, macrophage. 
Dataare mean:s.e.m.;one-way ANOVA €e, 1,0), two-way ANOVA (h, I) or 
three-way ANOVA (g) with Bonferroni post hoc test, two-tailed Student's etest 
(k,m). Forgel source data, see Supplementary Fi 
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Extended Data Fig. 6| See next page for caption. 


Extended Data Fig. 6 | Chondrocytes donot depend on FAO. a, Quantification 
of glycolytic rate, oxygen consumption nd palmitate oxidation n periosteal 
cells (PC, n= S biologically independent samples), skeletal stem cells (SSC, 
biologically independentsamples), growth plate-derived chondrocytes (GCH, 
biologically independent samples for oxygen consumption, n=4 
biologically independentsamples for glycolysis and palmitate oxidation), rib 
chondrocytes (RCH, n=5 biologically independent samples for oxygen 
consumption, n= 4 biologically independent samples for glycolysis and 
palmitate oxidation), calvarial osteoblasts (COB,n =5 biologically independent 
samples) and trabecular osteoblasts (TOB, n=5 biologically independent 
samples). b, ¢ Distributed stochastic neighbour embedding (tSNE) plot of 
20,896 non-haematopoietic cells (mixed bone and bone marrow fractions, 

n= 6mice) based on single-cell RNA-seq data, annotated posthoc and coloured 
by clustering (top) or by expression (In(TP1OK) of selected genes (bottom). 

¢, Expression (row-wide Zscore of In of average TP1OK; single-cell RNA-seq) of 
FAO- and glycolysis-related genes (rows) in the cells of each cluster (columns). 
4, RT-qPCR analysis of genesinvolvedin glycolysis (Glut! (alsoknownas 
Sle2a1), Pfkfb3and Ldha; n=6 independent samples for Gluct and Pfkfb3in 
cartilage, n=9 independent samples for Glueland Pfkfb3inbone, n=8 


independentsamples for dha) and FAO (Cptia, Acadmand Acadt;n=8 
independent samples) in mouse growth plate cartilage and cortical bone 
biopsies (relative to Actb).e, Analysis of adjacent histological sections ofa 
growth plate and fracture callus(PFD7) of mice injectedintravenously witha 
fluorescent fatty acid (Red-C12) or glucose (2-NBDG) (representative images of 
‘n= 3mice).Scale bars, 100 min growth plate images, SO pm in fracture callus 
images. f, lmmunofluorescence analysis of afracture callus (PFD7) ofamouse 
injected intravenously witha fluorescent fatty acid (Red-C12) and stained for 
SOX9 (left; cartilage area shown) or COLI (right; trabecular bone area shown) 
(representative images of n= 3 mice). Scale bars, 50 ym. g, Histological 
visualization and quantification at PFD7 of CAG-DsRed’ skeletal stem cells 
(SSC), transduced with shCptia or shScr and transplanted at the fracture site on 
PEDO (n=3mice). Dotted lines delineate cortical bone ends. h, Quantification 
of number oflive and dead cellsin cultures of periosteal cells, growth plate- 
derived chondrocytesand calvarial osteoblastsafter 48h of exposure to 
etomoxir (n= biologically independent samples).Dataare mean:+s.e.m. 
Way (a) or two-way (lh) ANOVA with Bonferroni post hoc test, two-tailed 
Student's ctest(d.g). 
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Extended Data Fig. 7 | Changes in FAO and autophagy after lipid deprivation. 
a, Oxidation of extracellularly added palmitate by periosteal cellsin control 
medium oratdifferenttimesin LS medium (n=4 biologically independent 
samples). b, Quantification of FAO -linked OCR in periostealcells in control 
medium oratdifferenttimesin LRS medium (n=4 biologically independent 
samples). ¢, Confocal microscopy of periosteal cells labelled with Red-C12 
(fluorescent fatty acid, red) and stained with MitoTracker (mitochondria, 
green) and DPH (lipid droplets, blue) shows increased colocalization (as, 
quantified by Pearsons correlation coefficient) of MitoTracker and Red-C12 
after exposure of cells for 6 hto SD medium (n=4 biologically independent 
samples).Scale bars, 20 jm. d, Immunoblot detection of LC3in total cell 
protein extracts of C3HIOT1/2.cellsand periosteal cells exposed or different 
timesto control or SD medium, withB-actin as loading control, Note increased 
conversion of LC3-1to LC3-Ilat early time points, indicative of activation of 
autophagy (n=2independent experiments).e, f, Confocal microscopy of 
C3HIOTI/2cells (e;n=3 independent experiments) or periosteal cells (f:n=3 
biologically independentsamples), expressing an RFP-GFP-LC3 tandem 
construct, showsactivation of autophagy with timeupon serum deprivation, 
evidenced by increased total number of LC3 puncta per ell and higher 
percentage of RFP*GFP” puncta. Scale bars, 20 um. g, Confocal microscopy: 
based visualization (top) and quantification (bottom) of C3HIOTI/2 cells, 
stained with theneutral lipid dye DPH toreveallipid-droplet dynamics at 
different time pointsafter SD. Cells were transduced with shArgs to inhibit 


autophagy or shScr as a control (n=6independent experiments).Scalebars, 
20 um. h, Quantification of FAO -linked OCR in periostealcellsin control 
mediumoratdifferent timesafter serum deprivation, treated with 10 #M 
chloroquine (CQ) or vehicle (n=3 biologically independent samples). 

4, Quantification of cell viability of C3H1OTI/2cellsand periosteal cellsafter 
72h of exposure to control or SD mediumin the presence or absence of 50 iM 
(C3HLOT1/2cells) oF 10 uM (periosteal cells) CQ (n=3independent 
experiments for C3HIOT1/2 cells, n= 3 biologically independent samples for 
periosteal cells).j, Immunoblot detection of total SOX9 in C3HIOTI/2 cells and 
nuclear SOX9 in periosteal cells exposed for 6h (C3HIOT1/2cells) or 24h 
(periosteal ells) to control medium (with DMSO as vehicle control) or medium 
supplemented with 100 uM etomoxir (Eto), with B-actin or lamin A/C as loading 
control (n=2independent experiments for C3H10T1/2cells,n=3biologically 
independentsamples for periosteal cells).k, Cell morphology of growth plate- 
derived chondrocytes transduced with shSox9 or shScr (representative images 
of six biologically independent samples).Scale bar, 100 M.1, RT-PCR 
analysis of genes involved inchondrogenesis (Sox9, Col2aland Acan)and FAO 
(Cptta, Acadmand Acadi) in growth plate-derived chondrocytes transduced 
with shSox9 orshScr (relative to shScr,n=6 biologically independent samples). 
Dataare mean-s.e.m.;one-way ANOVA a, be, f) or two-way ANOVA (g-i) with 
Bonferroni post hoc test, two-tailed Student's test (c,1). For gel source data, 
see Supplementary Fig.1. 
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Extended Data Fig. 8 | Lipids regulate SOX9 through FOXO signalling. a, Heat 
map showing differential expression of cartilage-related genesin C3H10T1/2 
cells exposed for different times toSD versus control medium, as determined 
byRNA-seq (n=3replicates).b, Volcano plot showingsignificantly enriched 
and depleted mRNAsin C3HI0T1/2cells exposed for 3or6h toSD versus 
control medium, asdetermined by RNA-seq (n=3 replicates).¢, Top 10 most, 
significantly enriched transcription factor motifs with normalizedenrichment 
scores (NES) in C3HIOTI/2.cells exposed for 3h (left) or 6h (right) to SD versus 
control medium, asdetermined by i-cisTarget analysis on the 100 most- 
significantly increased mRNAs (n=3 replicates). Motifshown on top isthe 
Hmgal motif for 3h and the Atf# motif for 6h.d, Confocal microscopy of 
C3HIOTI/2cells stained for FOXO1 after exposure of cells for3hto SD or LRS 
medium in the presence of vehicle (EtOH), oleate (60 1M) or PUFA 
(representative images of two independent experiments).Scale bars, 20 jm. 
¢, Nuclear FOXO activity in C3HIOT1/2.cells exposed for3h to control, SD or 
LRS medium (n=Sindependent experiments).f, Nuclear FOXO activity in 
skeletal stem cells exposed for 3h to control medium, LRSmediumorLRS. 
medium supplemented with PUFA (n=3 biologically independent samples). 
EtOH wasusedas vehicle control. g, Occupancy of FOXOLat the Sox9 promoter 
of Cas9-expressing C3HIOT1/2cells transduced with sgFoxol, sgFoxo3aor 
sgScr, exposed for 3h to control or SD medium, as determined by ChIP-qPCR 
(n=3independent experiments). h, Flow cytometric quantification of total 
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soxowri 


SSC (DsRea) 


SOX9 levelsin C3HI0T1/2 cells (n=4independentexperimentsfor controland 
serum deprivation, n=3 independent experiments for RS) and skeletal stem 
cells (n=3 biologically independentsamples) exposed for24h to control, SD or 
LRS medium supplemented with 1 1MAS1842856 or vehicle (DMSO), 
Immunoblot detection of total SOX9 in Cas9-expressing C3HIOTI/2.cells, 
transduced with inducible sgFoxol and sgFoxo3a (sgFoxol/3a) or with sgScr, 
exposed for 6h tocontrol, SD or LRS medium in the presence orabsence of 
doxycycline (dox; 250 ng mt"), with B-actin asloading control (n=2 
independentexperiments).j, Flow cytometric quantification of total SOX9 
levels in skeletal stem cells transduced with shFoxol and shFoxo3a (shFoxol/3a) 
or with shScr, exposed for 24 hto control, SD or LRSmedium (n=5 biologically 
independentsamples).k, Histological visualizationand quantification of 
FOXO3a-expressing cells in the fracture callusat PFD7 of micetreated daily 
‘with GW9508 (10 nmol) or vehicle (0.2%DMSO in saline) atthe fracturesite 
{n=Smice).Scalebars,500 um. Dotted lines delineate cortical bone ends. 

1, Histological visualizationand quantification in the fracture callus at PFD7 of 
CAG-DsRed’ skeletal stem cells (SSC), transduced with shFoxol/3aor shScrand 
transplanted at the fracture site on PFDO (n=Smice). Dotted lines delineate 
cortical bone ends. Data are mean-+s.e.m.; one-way ANOVA (e,f), two-way 
ANOVA (g, h,j) with Bonferroni posthoc test, two-tailed Student's ctest (k,1). 
For gel source data, see Supplementary Fig.1. 
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Extended Data Fig. 9|Flow cytometry gating for cell sorting. a, Contour gating strategy for the identification and isolation of macrophages, 


plots showing the gatingstrategy for the identification and isolation of skeletal endothelial cellsand pericytes from skeletal muscle of adult mice. 
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natureresearch Ghietonsevasace 


Last updated by author(s): Nov 26, 2019 


Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe mare complex techniques in the Methods section, 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t,r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


(Our web cellection on statistics for biologists cantain 


‘ortcles on many ofthe points above 


Software and code 


Policy information about availability of computer code 


Data collection ‘gRT-PCR: StepOne Real-Time PCR software 2.3 
Flow cytometry: 8D FACSDiva & 0 
Seahorse flux analyzer: XF Reader 1.8.1.1 
scintillation counting: QuantaSmart TM 4.0 Perkin Elmer 
Imaging: Zeiss Zen 2,5, Zelss AxioVision 4.9.1 
Computer Madeling: MatLab R2012a/R2013a 


Data analysis CT: CT Analyzer 1.16.4.1, CT Vol 2.3.2.0, MeVislab 2.6.2 
Flow cytometry: Flowlo 10.5.3 
‘Statistics: GrophPad Prism 8.1.2 
Graphs: GraphPad Prism 5.0 
Image analysis: magel/Flil 2.0.0, CellProfiler 3.1.8 
Image preparation: Phatoshap CS5 
RNAsea/scRNAseq analysis: TopHat 2.1.1, R statistical software 3.5.1, EdgeR 3.20.9, i-cisTarget 
For manuscripts utlzing custom algorithms or software that are central ta the research but not yet described in published literature, software must be made avaiable to editors/reviewers 
We stronly encourage code deposition In a community repository e.. GiMub). See the Nature Research guidelines for submitting code & software far further information 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- Alist of figures that have associated raw data 
A description of any restrictions on data availability 


‘The bulk mRNA sequencing data that support the findings of this study have been deposited in ArrayExpress with the accession number E-MTAB-7564 (http:// 
\www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-7564). The single cell RNA sequencing data were generated previously and are deposited in GEO (GSE128423, 
https://wwnw.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE128423). A portal for exploring the entire atlas is available (https://portals.broadinstitute.org/single_cell/ 
study/mouse-bone-marrow-stroma-in-homeostasis). All other data supporting the findings of this study are available within the paper. 


Figures 2,4 and Extended Data Figures 4,5,7,8 have associated raw data, pravided as Supplemental information Figure 1 for uncropped blot pictures 


Field-specific reporting 


Please select the one belaw that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection 


Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr reporting summary fat 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size No statistical tests were used to pre-determine sample size, but sample size was chosen based on previous experiments and comparable 
studies in literature. Sample size for each experiment is indicated in the legend, 


Data exclusions _ No data was excluded, 

Replication In the studies performed in cell lines in culture, all experiments were independently repeated at least three times. Experiments using primary 
cells were performed with at least three biological replicates. Western Blots were independently repeated at least twice. All attempts at 
replication were successful 

Randomization | Mice for experiments were randomly allacated to groups, 

Blinding. Blinding was widely used in the study. Data collection and analysis, such as immunostaining, qRT-PCR, and Western blot were frequently 


performed by participants other than the experiment designer. During these data collection and analysis steps, all participants were routinely 
blinded to group allocation 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methads used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. IF you are not sure if list itern applies to your research, read the appropriate section before selecting a response, 


Materials & experimental systems Methods 


n/a] Involved in the study n/a | Involved in the study 
}R antibodies PAD cop-sea 

[| eukaryotic cet tines CO]BR Fiow cytometry 

[XI] Palaeontology [QD mat-based neuroimaging 
[O}I Animats and other organisms 

[XI] Human research participants 


BQO clinica data 


Antibodies 


Antibodies used Antibodies for flow cytometry 
PE/Cy7 anti-mouse CD45: BioLegend, #103114, clone 30-F11, lot 8243728, 1/200 dilution 
PE/Cy7 anti-mouse TER-119: BioLegend, #116222, clone TER-119, lot B251241, 1/200 dilution 
APC anti-mouse CO202b (Tie-2, CO202): Biolegend, #124010, clone TEKA, lot 8231548, 1/200 dilution 
PE anti-mouse CD202b (Tie-2, C0202): BioLegend, #124008, clone TEKA, lot 8207408, 1/200 dilution 


[APC anti-mouse CD105: BloLegend, #120414, clone MI7/18, lot 8204640, 1/200 dilution 
Pacific Blue anti-mouse CD105: BioLegend, #120412, clone MU7/18, lot 8245562, 1/200 dilution 

APC anti-mouse CD90,2: BioLegend, #105312, clone 30-H12, lot 8208842, 1/200 dilution 

FITC anti-mouse Ly-51 (6C3): BioLegend, #108305, clone 6C3, lot 8218198, 1/200 dilution 

PerCP-eFluor 710 anti-mouse CD200: eBioscience, #46-5200-82, clone OXS0, lot 4298111, 1/200 dilution 
Biotin anti-mouse CDS1: 8D Pharmingen, #551380, clone RMV-7, lot 3301919, 1/200 dilution 

167: BD Pharmingen, #556027, clone B56, lot 5113743, 1/10 dilution 

tive Caspase 3: BD Pharmingen, #550821, clone C92-605, lot 25660, 1/100 dilution 

AlexaFluor 647 anti-SOXS: Cell Signaling Technology, #71273, clone D8G8H, lot 1, 1/100 dilution 

APC anti-mouse CO146: BioLegend, #134712, clone ME-9F1, lot 8268897, 1/200 dilution 

Pacific Blue anti-mouse F4/80: BioLegend, #123124, clone BM8, lot 8217178, 1/200 dilution 
PerCP/CyS.S anti-mouse CD31: BioLegend, #102522, clone MEC13.3, lot 4052815, 1/200 dilution 


Primary antibodies for immunohistachemical analysis: 
BrdU: Bio-Rad, #OBTO030, clone BU 1/75-ICR1, Rat, lot 0512, 1/500 dilution 

CD31: BD Biosciences, 550274, clane MEC 13,3, Rat, lot 7292994, 1/50 dilution 

type | collagen: Novus Biologicals, clone NB600-408, polyclonal, rabbit, lot 40267, 1/100 dilution 
type Il collagen: Merck, MA88887, monoclonal, mouse, clone 6B3, lot 2933390, 1/200 dilution 
SOX9: Novus Biologicals, NBP1-85551, polyclonal, rabbit, lot 8113838, 1/200 dilution 

CPT La: Cell Signaling Technology, #12252, clone 0383, rabbit, lot 1, 1/50 dilution 

‘GLUT1: Cell Signaling Technology, #12939, clone D3)3A, rabbit, lot 1, 1/100 dilution 

Fox03a: Cell Signaling Technology, #2497, clone 7508, rabbit, lot S, 1/100 dilution 


Secondary antibodies for immunahistochemical analysis: 
Biotin anti-mouse: Dako, #£0433, goat, lot 00062137 

Fluorescein anti-mouse: Sigma-Aldrich, #F0257, goat, ot SLBV6490, 

Biotin anti-rat: BD Biosciences, #559286, goat, lot 6321784 

Biotin anti-rat: Dako, #£0468, rabbit, lat 00043220 

Biotin anti-rabbit: Dako, #£0432, goat, lot 20027287 

Cy3 anti-rabbit: Jackson ImmunoResearch, #111-165-003, goat, lot 84241. 
Alexafluor 546 anti-rabbit: Invitrogen, #A-11010, goat, lot 1904467 
AlexaFluor 488 anti-rabbit: Invitrogen; #4-11034, goat, lot 1937195, 


Primary antibodies for immunocytochemical analysis: 
FoxO1: Cell Signaling Technology, #2880, clone C29H4, rabbit, lot 11, 1/100 dilution 
Fox03a: Cell Signaling Technology, #2497, clone 7508, rabbit, lot 5, 1/100 dilution 


Secondary antibodies for immunacytochemical analysis 
Alevafluor 488 anti-rabbit: Invitrogen; #4-11034, goat, lat 1937195 


Primary antibodies for Western Biot: 
'SOX9: Novus Biologicals, NBP1-85551, polyclonal, rabbit, lot 8113838, 1/2000 dilution 

FoxOl: Cell Signaling Technology, #2880, clone C29H4, rabbit, lot 11, 1/1000 dilution 

Fox03a: Cell Signaling Technology, #2497, clone 7508, rabbit, lot 5, 1/1000 dilution 

C38: Cell Signaling Technology, #3868, clone D11, rabbit, lot 11, 1/500 dilution 

Bractin: Sigma-Aldrich, #45441, clone AC-15, mouse, lot 026M, 1/10000 dilution 

Lamin A/C: Santa Crue Biotechnology, #sc-376248, clone €-1, mouse, lat C1412, 1/5000 dilution 


Secandary antibodies for Western blot analysis: 
HAP anti-mouse: Dako, #P0161, rabbit, lat 00095192 
HAP anti-rabbit: Dako, #0448, goat, lot 00094764 


Primary antibodies for ChiP-qPCR: 
FoxO1: Abcam, #ab39670, rabbit, lat GR3192176-1, 1/250 dilution 
FoxO3a: Abcam, ttab12162, rabbit, lat GR226465-14, 1/250 dilution 


Validation All antibodies were obtained from indicated commercial vendors with ensured quality. In addition, all the antibodies have been 
sed in multiple experiments ta detect intended prateins in contral samples with expected molecular weight to validate their 
effectiveness in our study 


Antibodies for flow cytometry: 


PE/Cy7 anti-mouse CD45: BioLegend, #103114, clone 30-F11 
RID: AB_312979 
Validated by the manufacturer by flow cytometry on CS7BL/6 mouse splenocytes, 46 citations 


PE/Cy? anti-mouse TER-119: BioLegend, #116222, clone TER-119 
RID: AB_2281408 
Validated by the manufacturer by flow cytometry on CS7BL/6 mouse bone marrow cells, 6 citations 


APC anti-mouse CD202b (Tie-2, CD202): BioLegend, #124010, clone TEK4 
RID: AB_10897105 
Validated by the manufacturer by flow cytometry on bEnd.3 mouse endothelial cells, 3 citations 


PE anti-mouse CD202b (Tie-2, CD202}: BioLegend, #124008, clone TEK4 


RID: AB_2287338 
Validated by the manufacturer by flow cytometry on bEnd.3 mouse endothelial cells, § citations 


APC anti-mouse CD105: BloLegend, #120414, clone MI7/18 
RRID: AB_2277914 
Validated by the manufacturer by flow cytometry on bEnd.3 mouse endothelial cells, 5 citations 


Pacific Blue anti-mouse CD105: BioLegend, #120412, clone MI7/18 
RID: AB_209889 
Validated by the manufacturer by flow cytometry on bénd.3 mouse endothelial cells, 2 citations 


APC anti-mouse CD90.2: BlaLegend, #105312, clone 30-H12 
RID: AB_313183 
Validated by the manufacturer by flow cytometry on CS7BL/6 mouse thymacytes, 12 citations 


FITC anti-mouse Ly-51 (6C3): BioLegend, #108305, clone 6C3 
RRID: AB_313362 
Validated by the manufacturer by flow cytometry on C57BL/6 mouse bone marrow cells, 6 citations 


PerCP-eFluor 710 anti-mouse CD200: eBioscience, #46-5200-82, clone OX30 
RID: AB_10598213, 
Validated by the manufacturer by flow cytometry on S7B1/6 splenocytes, 1 citation 


Biotin anti-mouse CD51; 8D Pharmingen, #551380, clone RMV-7 
RID: AB_394174 
Validated by the manufacturer by flow cytometry on BALB/c bone marrow leukocytes, & citations 


PE anti-Ki67: BD Pharmingen, #556027, clane B56 
RID: AB_2266296 
Validated by the manufacturer by flow cytometry on permeabilized MOLT-4 cells, 14 citations 


PE anti-active Caspase 3: BD Pharmingen, #550821, clone C92-605 
RID: AB_393906 
Validated by the manufacturer by flow cytometry on camptothecin treated Jurkat cells, 3 citations 


AlexaFluor 647 anti-SOXS: Cell Signaling Technology, #71273, clone D&G&H 
RRID: AB_2799799 
Validated by the manufacturer by flow cytometry on HeLa cells (blue) and A-204 cells, 12 citations 


APC anti-mouse CD146: BioLegend, #134712, clone ME-9FL 
RID: AB_2563088 
Validated by the manufacturer by flow cytometry on Mouse endothelial cells, 3 citations 


Pacific Blue anti-mouse F4/80: BioLegend, #123124, clone BMB 
RID: AB_893475 
Validated by the manufacturer by flow cytometry on Thioglycolate-elicted Balb/c mouse peritoneal macraphages, 21 citations 


PerP/Cy5.5 anti-mouse CD31: BioLegend, #102522, clone MEC13.3, 
RID: AB_2566761 
Validated by the manufacturer by flow cytometry on CS7BL/6 mouse splenocytes, 11 citations 


Primary antibodies for immunahistachemical analysis: 


BrdU: Bio-Rad, #080030, clone BU 1/75-ICR1, Rat 
RID: AB_609568 
Validated by the manufacturer far immunahistochemistry on formalin-fixed paraffin-embedded tissue, 31 citations 


(CD31: BD Biosciences, 550274, clone MEC 13.3, Rat 
RID: AB_393571 
Validated by the manufacturer far immunghistochemistry on zinc-fixed paraffin-embedded section af U-87 MG tumor in mouse 
brain, 7 citations 


type | collagen: Novus Biologicals, clone NB600-408, polyclonal, rabbit 
RRID: AB_343276 

Validated by the manufacturer for immunohistochemistry on FFPE sections of mouse pancreas tissue and rat colon tissue, 28 
citations 


type Il collagen: Merck Millipore, MAB8887, monoclonal, mouse, clone 683, 
RID: AB_2260779 
Validated by the manufacturer far immunohistochemistry on fetal cartilage, 30 citations 


‘509: Novus Biologicals, NBP1-85551, polyclonal, rabbit 
RID: AB_11002706 

Validated by the manufacturer for immunohistochemistry on FFPE sections of human colorectal cancer, glioma, skeletal muscle 
and small intestine, 4 citations 


PT ta: Cell Signaling Technology, #12252, clane 0383, rabbit 
RID: AB_2797857 
Validated by the manufacturer on Hela, PANC-1 and MCF? cells, 16 citations 


GLUT1: Cell Signaling Technology, #12939, clone D3/3A, rabbit 
RID: AB_2687899 
Validated by the manufacturer on HepG2 and Huh6 cells, 9 citations 


FoxO3a: Cell Signaling Technology, #2497, clone 7508, rabbit 
RID: AB_836876 
Validated by the manufacturer on SH-SYSY cells IGF-I or LY294002 treated, 230 citations 


Primary antibodies for immunacytochemical analysis: 


Fox01: Cell Signaling Technology, #2880, clone C29H4, rabbit 
RID: AB_2106495 
Validated by the manufacturer far immunofluorescent analysis in IGROV-1cells, 409 citations 


FoxO3a: Cell Signaling Technology, #2497, clone 7508, rabbit 
RRID: AB_836876 
Validated by the manufacturer far immunafluarescent analysis on SH-SYSV cells, 230 citations 


Primary antibodies for Western Blot: 


‘5X9: Novus Biologicals, NBP1-25551, polyclonal, rabbit 
RID: AB_11002705 

Validated by the manufacturer for western blat analysis mammalian HEK2937 cells, mouse NIH-3t3 cells and rat NBT-II cells, 2 
citations 


FoxO1: Cell Signaling Technology, #2880, clone C29H4, rabbit 
RID: AB_2106495 
Validated by the manufacturer for western blot analysis on extracts from IGROV-1 and COS-7 cells, 409 citations 


FoxO3a: Cell Signaling Technology, #2497, clone 7508, rabbit 
RRID: AB_836876 
Validated by the manufacturer for western blat analysis on extracts from Jurkat and PC3 cells, 230 citations 


Lc38: Cell Signaling Technology, #3868, clone D11, rabbit 
RRID: AB_2137707 
Validated by the manufacturer for western blat analysis on extracts of variaus cell line treated with chloroquine, 31 citations 


Bractin: Sigma-Aldrich, #45441, clane AC-15, mouse 
RRID: AB_476744 
Validated by the manufacturer for western blat analysis on cultured human or chicken fibroblast cell extracts, 299 citations 


Lamin A/C: Santa Cruz Biotechnology, #sc-376248, clone E-1, mouse 
RID: AB_10991536 
Validated by the manufacturer for western bat analysis on cell extracts of different mouse and human cell lines, 4 citations 


Primary antibodies for ChiP-qPCR: 


Fox01: Abcam, #ab39670, rabbit 
RRID: AB_732421 
Validated by the manufacturer far ChIP analysis on mouse T cells, 3 citations 


FoxO3a: Abcam, ftab12162, rabbit 
RID: AB_298803 
Validated by the manufacturer far ChIP analysis on pig coranary artery endothelial cells, 30 citations 


Bractin: Sigma-Aldrich, #45441, clane AC-15, mouse 
RRID: AB_476744 
Validated by the manufacturer for western blot analysis on cultured human or chicken fibroblast cell extracts, 299 citations 


Lamin A/C: Santa Cruz Biotechnology, #sc-376248, clone E-1, mouse 
ARID: AB_10991536 
Validated by the manufacturer for western bat analysis on cell extracts of different mouse and human cell lines, 4 citations 


Primary antibodies for ChiP-qPCR: 
FoxO1: Abcam, #ab39670, rabbit 

RID: AB_732421 

Validated by the manufacturer far ChIP analysis on mouse T cells, 3 citations 


FoxO3a: Abcam, ttab12162, rabbit 


RID: AB_298803 
Validated by the manufacturer far ChIP analysis on pig coranary artery endothelial cells, 30 citations 


Eukaryotic cell lines 


Policy information about cell ines 


Cell line source(s) C3H10T1/2 cells were obtained from the RIKEN Cell Bank 
Authentication None of the cell lines used were authenticated 
Mycoplasma contamination _Celllines were routinely tested for mycoplasma contamination and found negative 


Commonly misidentified lines The C3H10T1/2 cell line is not among the commonly misidentified cell ines 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals ‘Analysis was performed on 3-5 day-old mice or 8-10 week-old male and female mice, C57B1/61 mice (Janvier Labs), 129/Sv mice 
(anvier Labs), 86,Cg-Te(CAG-EGFP) mice (Hadjantonakis, A K. et al, Mech. Dev., 1998), B6.Cg-Tg(Colla1-cre/ERT2,- 
DsRed)1Smkr/J mice (Ouyang, Z. et al, Bone, 2014), B6;129S4-Soxtm1.1Tlu/! mice and B6.Cg-Te(CAG-DsRed*MST)1Nagy/ 
mice (The Jackson Laboratory) were used in this study. 
All colonies were housed and bred in individually ventilated cages in the animal facility of the KU Leuven, 


Wild animals ‘The study did not use any wild animals 

Field-collected samples __| The study did not include field-collected samples 

Ethics oversight All animal experiments were conducted according to the regulations and with approval of the Animal Ethics Committee of the KU 
Leuven. 


Note that full information on the approval of the study protocol must also be provided in the manuscript, 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluarachrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group’ is an analysis of identical markers). 


All plats are contour plots with outliers or pseudacalor plots. 


Anumerical value for number of cells or percentage (with statistics) is provided. 


Methodology 


Sample preparation Isolation of mouse skeletal stem cells was adapted from a previously described protocol (Chan, C. k., etal, Cell, 2015). Long 
bones of 3-5 day ald mice were dissected, muscle was cleared away and bones were minced using a scalpel. Bone fragments 
‘were then digested in a-MEM supplemented with 3mg/ml collagenase i, mg/ml dispase (bath from Gibco) and 100U/ml DNAse 
|{Sigma) at 37°C for 45 minutes, with replacement of the digest medium every 15 minutes. Cell suspensions were passed 
through a 70um cell strainer, washed with PBS containing 2% FBS and stained with antibodies against CD45, Ter119, Tie2, 
D105, CD90.2, 6C3 (BioLegend), CDS1 (BD Pharmingen) and CD200 (eBiosciencel, and with the viability dye 7- 
aminoactinomycin D (BD Pharmingen). 


For the isolation of skeletal muscle-derived cell populations, hindlimb skeletal muscles, including quadriceps, soleus, 
{gastrocnemius and tibialis anterior, were dissected from 8-week old CAG-DsReed mice, minced using a scalpel and digested in a- 
MEM medium supplemented with 3mg/m| collagenase Il, mg/ml dispase and 100U/ml DNAse | at 37°C for 60 minutes. Every 15, 
minutes samples were pipetted up and down vigorously using a 10m serological pipette to break up tissue fragments. Cell 
‘suspensions were passed through a 7Oum nylon mesh, washed with PBS containing 2% FBS and stained with antibodies against 
CD45, Ter119, CD31, F4/B0 and CD146 (BioLegend), and with 7AAD (8D Pharmingen}, 


Instrument BD LSRII 
Software BD FACSDiva 
Cell population abundance | Post-sort purity was nat determined 


Gating strategy Phenotypic skeletal stem cells (7AAD-CD45-Ter119-Tie2-CDS1+CD105-CD90,2-6C3-CD200+) were sorted an a BD FACSAria Il (BD 


Biosciences). Previously-defined gating strategies (Chan, C.K., etal, Cell, 2015} were followed and are given in Extended Data 


Gating strategy 
Fig. 93) 


Immunophenotypically-defined macraphages (7AAD-CD45+F4/80+), endothelial cells (7AAD-CD45-Ter119-F4/80-CD31+CD146+) 
and pericytes (7AAD-CD45-Ter1 19-F4/80-CO31-CD1464) (Extended Data Fig. 9b) were sorted on a BD FACSAYia Il, Gating 
strategies are described in Extended Data Fig. 9b. 


Analysis of SOX9high cells: Gating for SOX9high cells was set to have approximately 10% SOXShigh cells in contral conditions for 
all cell types. 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Ithaslong been assumed that lifespan and healthspan correlate strongly, yet the two 
can be clearly dissociated" *, Although there has been a global increase in human life 
expectancy, increasing longevity is rarely accompanied by an extended healthspan*’. 
Thus, understanding the origin of healthy behavioursin old people remainsan 
important and challenging task. Here we report a conserved epigenetic mechanism 
underlying healthy ageing. Through genome-wide RNA-interference-based screening 
of genes that regulate behavioural deterioration in ageing Caenorhabditis elegans, we 
identify 59 genes as potential modulators of the rate of age-related behavioural 
deterioration. Among these modulators, we found that a neuronal epigenetic reader, 


BAZ-2, and aneuronal histone 3 lysine 9 methyltransferase, SET-6, accelerate 
behavioural deterioration in C. elegans by reducing mitochondrial function, 
repressing the expression of nuclear-encoded mitochondrial proteins. This 
mechanism is conserved in cultured mouse neurons and human cells. Examination of 
human databases*’ shows that expression of the human orthologues of these 

C. elegans regulators, BAZ2B and EHMT1, in the frontal cortex increases with age and 
correlates positively with the progression of Alzheimer’s disease. Furthermore, 
ablation of Baz2b, the mouse orthologue of BAZ-2, attenuates age-dependent body- 
weight gain and prevents cognitive decline in ageing mice. Thus our genome-wide 
RNA-interference screen in C. elegans has unravelled conserved epigenetic negative 
regulators of ageing, suggesting possible ways to achieve healthy ageing. 


Previous studies have shown that dopamine signalling declines with 
age'™", and that higher dopamine levels in aged people improve 
their cognitive functions". In C. elegans, an age-related decline in the 
level of the BAS-1 protein—a shared serotonin (5-HT)-and dopamine- 
synthesizing enzyme (DOPA decarboxylase; Fig. 1a) -is responsible for 
the loss of these neurotransmitters anda behavioural deterioration’, 
thus providing a genetically traceable marker ofageingin the nervous 
system. We therefore performed a genome-wide RNA interference 
(RNAi screen for regulators ofageing by examining changes inthelevel 
of the BAS-1 protein in C. elegans. We used transgenic worms (named 
Phas bas1::gfp) that express BAS-Lfused to green fluorescent protein 
(GFP), andindividually fed the worms with bacteria expressing differ- 
ent double-stranded RNAs that correspond to roughly 80% of the total 
predicted genes in C. elegans (Fig. 1b). To enhance neuronal uptake of 
dsRNAs, weintroducedSID-1-achannelallowingdsRNA diffusion into 
the nervous system of the transgenic worms". 

After three rounds of testing, we obtained 59 screening hitsthatpre- 
vented an age-related reduction in the BAS-I protein level; these genes 
encode various protein classes, includingnucleic-acid-binding proteins, 
receptors and transporters (Extended Data Fig, la~cand Supplementary 


Table 1). We then performed gene-network analysis of these screen- 
inghits using GeneMANIA software" and constructed a coexpression 
network that reveals the interaction among individual genes and their 
partners (Extended Data Fig. 1d). In line with the aim of our screening, 
we found ten hits whose human orthologuesareinvolvedin age-related 
neurodegeneration or cell senescence (Extended Data Fig. 1dand Sup- 
plementary Table1). We nextexamined whether the genes corresponding 
tothe screening hitsaffect ageing-related behavioural decline, and found 
that downregulation of themajority of 20 tested genes improved behav- 
ioural performance in pharyngeal pumping in aged worms (Extended 
Data Fig.1e). Thus our genome-wide screen for regulators of ageing pro- 
videsa global view of moleculesinvolvedintheageingnervoussystem. 


BAZ-2, SET-6 and age-related decline 

Among the seven most prominent hits, a putative epigenetic reader, 
BAZ2,andaputativehistone3lysine 9 (H3K9) methyltransferase, SET-6, 
appeared ata key node in the network (Extended Data Fig. 1b, d). They 
were broadly expressedin the C.elegansnervoussystem, includinghead, 
bodyandtail neurons (Extended DataFig. 2a), and were colocalized witha 
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Fig.1/A genome-widescreen of . elegansidentifies genes that regulate age- 
relatedlossof5-HT and dopamine.a, Fluorescent images (left) and 
‘quantitative analysis (right) of BAS-1 expressionat different ages in transgenic 
worms, Quantitative analysis of BAS-1 levels was performed by measuring GFP 
fluorescenceintensity in the soma of the NSM neurons. White arrowsindicate 
the NSM neurons. Scalebar, 10 um. Thenumbers ofthe total tested worms for 
each data point are shown beneath the.-axis., Illustrationshowing the 
genome-wide RNAiscreen. A RNAilibrary containing 16,256 clones was 
screened inthe first round; RNAi clones that preventeda declinein the BAS- 
1:GFP level inthe transgenic worms at day 9 of adulthood were selected and 
subjected to further rounds of testing. The clones that markedly increased the 


nuclear stain, 4’,6-diamidino-2-phenylindole (DAPI), indicating that they 
arenuclear proteins (Extended Data Fig. 2b). Notably, theexpression of 
BAZ-2and SET-6 increased withage (Fig. Icand Extended DataFig. 2c, d).. 

We then investigated the function of BAZ-2and SET-6 in modulating 
the ageing process, and found that deleting baz-2, set-6 or both (dou- 
ble mutant baz2;set-6) increased the levels of BAS-1 (Extended Data 
Fig. 2e), 5-HT and dopamine (Fig. 1d) in aged worms, but not in young, 
adult worms. We next examined whether the increase in endogenous 
S-HT and dopamine caused by deleting baz2or set-6 could improve 
behavioural performance inageing worms. Anage-related loss of -HT 
and dopamine causes the decline of many important behaviours in 
elegans, including pharyngeal pumping, male mating, and response 
to food?, We found that deleting baz-2or set-6 prevented this age-related 
behavioural decline, but had no effect on the behaviours of young 
worms (Fig. 2a-d). This prevention could be reversed by re-expressing 
baz2orset-6 through their own promotersin their respective mutant 
worms (Fig. 2a-d). Insupportof the notion thatepigeneticmodulation 
playsa critical partin modulating longevity’, we found thatdeleting 
baz2orset-6 caused amoderate extension of the worm lifespan (Fig.2e) 
and enhanced the worm’s capacity to resist certain environmental 
insults (hydrogen peroxide, ultraviolet light and a 35°C heat shock; 
Extended Data Fig. 3a-c). Moreover, a loss-of-function mutation in 
daf-16-the key transcription factor in the insulin signalling pathway"”— 
did notalter the effect of az2 or set-6 deletion on lifespan extension 
and resistance to oxidative stress (Extended Data Fig. 3d), suggesting 
that daf.16isnotrequired for the effects of baz-2or set-6 onageing. By 
contrast, the effect of deleting baz:2or set-6 was abolished in response 
to dietary restriction caused by an eat-2mutation', or in response to 
reduced mitochondrial function caused by a clk-1mutation” (Extended 
Data Fig. 3e, f). Thus, the effect of baz2 or set-6 deletion on the age- 
ing process is likely to be mediated by mechanisms related to dietary 
restriction and mitochondrial function. 


BAS-:GFP level inageing transgenic worms in the third round of testing were 
scoredas positive clones. ¢, Age-dependent changesin transcription levels of 
baz2and set-6,n=4 independent experiments. d, Dopamine (left) and S-HT 
(right) levels in N2, baz-2, ser-6 or baz-2:set-6 mutant wormsat day Lor day 9 of 
adulthood, The neurotransmitterlevels were determined by high-performance 
liquid chromatography (HPLC). The numbers of independent assays are 
indicated in each column. All datashown are means+s.e.m.;*P<0.05; 
*"P<0.01;***P<0.001; ns, not significant (cand d, one-way analysis of variance 
(ANOVA) with Dunnett’ test; see Supplementary Information for exact 
Pvalues).Incand d, each data pointrepresentsthe result of one independent 
experiment. 


BAZ-2 and SET-6 regulate H3K9 methylation 
Notably, we found that the regulation by baz-2and set-6 of age-related 
behavioural deterioration, longevity and stress response was not addi- 
tive (Fig. 2a-e and Extended Data Fig. 3a-c), suggesting that baz-2and 
set-6 actin the same genetic pathway. In support of this notion, we 
found that endogenous SET-6 and BAZ-2 co-immunoprecipitated 
(Extended Data Fig. 4a) in homogenates extracted from genome- 
edited baz2"“;ser-6°" worms (with baz-2 tagged by GFP::FLAG 
andset-6 tagged by GFP::HA, where HA ishaemagglutinin), The BAZ-2 
protein belongs to a family of evolutionarily conserved proteins that 
contain the plant homeodomain (PHD) and the bromodomain, both 
of which recognize modified histone tails”; SET-6 is a putative H3K9 
methyltransferase. To examine whether SET-6 has methyltransferase 
activity, we performed in vitro histone methylation assays. Incubation 
of purified truncated proteins containing the SET domain of SET-6 (SET- 
6s.) with calfhistone substrates increased levels of H3K9 dimethylation 
(H3K9me2) and trimethylation (H3K9me3), but not monomethylation 
(H3K9mel) (Fig. 3a), indicating that SET-6 isa methyltransferase for 
H3K9me2and H3K9me3. Furthermore, we found that deleting baz:2 
or set-6 reduced the level of global H3K9me3, but not H3K9mel and 
H3K9me2 (Extended Data Fig. 4b). Thus, BAZ-2 and SET-6 could func- 
tion together to regulate the methylation of H3K9 in C. elegans. 
‘Toinvestigate which genesare regulated by BAZ-2and SET-6, we exam- 
ined the genome-wide distribution of bindingsites for these two pro- 
teins via chromatinimmunoprecipitation followed by high-throughput 
DNA sequencing (ChIP-seq) analyses in transgenic worms expressing 
GFP-fused SET-6 or BAZ-2 (with their own promoters), using anti-GFP 
antibodies. We found that BAZ-2 and SET-6 co-occupied the promoter 
region of 2,383 genes (Fig. 3b, Extended Data Fig. 4c, dandSupplemen- 
tary Table 2), which account for 71.6% and 77.1% of all genes bound by 
BAZ-2andSET-6, respectively (P< 3x 10™ Fisher’sexact test).Among 
the co-occupied genes, nuclear genes encoding nucleotide-binding 
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Fig.2| Deletion of baz-2or set-6 delays age-related behavioural decline and 
extends lifespan inC. elegans.a,b, Age-related declinein pharyngeal 
pumping (a) and male mating (b) in worms with different genotypes, The 
numbers of independentassaysare indicated in parentheses (a) orareshown 
beneath the bars (b). c,d, Basal slowing response (BSR) of well fed worms 
towards food (¢),and enhanced slowing response (ESR) of food-deprived 
‘worms (d), at day I (young) or day 9 (aged) of adulthood. BSRand ESR were 
‘quantified by the frequency of body bends. The total numbers of tested worms 
areshown beneath the bars. The £. coli strain HB101 was used as the food source 
inthis assay. Ina-d, datashownare means s.e.m.e, Lifespan curves of worms 
with different genotypes. Datarepresent the sum of animalsin multiple 
experiments; the numbers of independentassaysand tested hermaphrodite 
areindicated in parentheses. Ina-e, “P<0.05;"P<0.01;"**P<0.001;ns, not 
significant(see Supplementary Information for exact Pvalues).a,b, One-way 
ANOVA with Dunnetr’stest;¢, 4, two-tailed rtest;e, two-sided log-rank test. 


proteins, metal-binding proteins, ribosomal proteinsand mitochondrial 
proteins wereenriched (Fig. 3c, d). The occupancy of theendogenously 
expressed BAZ-2and SET-6at the promoter region of thosenuclear genes 
encoding mitochondrial proteins was confirmed by ChIP quantitative 
polymerasechainreaction (qPCR) analysisin genome-edited baz 2” 
and set-6"""* worms, respectively (Extended Data Fig. 4e,). 

We then investigated transcriptome changes caused by deleting set-6 
or baz-2, and found that a total of 450 differentially expressed genes 
were presentin both baz-2and set-6 mutant worms (Fig. 3e, Extended 
Data Fig, Saand Supplementary Table 3). These common genes account 
for32.3%and 40.3% of thetotal differentially expressed genesinbaz2 
and set-6 mutant worms, respectively (P<1.3 x 10°™*; Fisher's exact 
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test). Notably, ribosomal and mitochondrial proteins were the two 
most enriched categories among450 common differentially expressed 
genes (Fig. 3d), the majority of which were upregulated (Extended Data 
Fig, 6a). Weverified theupregulation of mitochondrial-function-related 
genes by reverse transcription (RT)-qPCR analysisin both youngadult 
and aged worms (Extended Data ig. 5b). Wenext grouped the 450 genes 
into $ gene clusters using the k-means method, and constructed heat 
mapsof the ChiP-seq data. Interestingly, we found that the binding of 
BAZ-2and SET-6 was more prominent at the promoter regions of genes 
in cluster2, among which genes related to mitochondrial and ribosomal 
functions were enriched (Extended Data Fig. 6a), suggesting that BAZ-2 
and SET-6 work together to repress the expression of mitochondrial- 
function-related genes by occupying their promoter regions. 

Wenext determined whether BAZ-2and SET-6 regulate gene expres- 
sion by regulating the H3K9 methylation levels of target genes in 
C elegans. We found that deleting baz-2 or set-6 reduced the level 
of H3K9me3, but not of H3K9mel and H3K9me?, at cluster 2 genes 
(Extended Data Fig. 6b-e), suggesting that BAZ-2 and SET-6 regulate 
the level of H3K9me3 on these genes. Thus, SET-6 and BAZ-2 repress 
the expression of mitochondrial-function-related genes by regulating 
the H3K9me3 levels of target genes. 


BAZ.-2, SET-6 and mitochondrial function 

We then inquired whether SET-6 and BAZ-2 regulate mitochondrial 
functions. We found that deleting and overexpressing baz-2 or set-6 
markedly elevated and reduced two key mitochondrial activities—ATP 
production and oxygen consumption—in both young adult and aged 
worms (Fig. 3f,g). The enhanced mitochondrial activities were not due 
to increased mitochondrial abundance, because we found that the 
ratio of mitochondrial DNA to nuclear DNA was unaffected in baz:2, 
set-6 and baz-2;set-6 mutant worms (Fig. 3h). 

Mitochondrial proteins are encoded by both nuclear and mito- 
chondrial genomes. Animbalance between the expression of proteins 
from these two sources activates the mitochondrial unfolded protein 
response (UPR™)“, which maintains mitochondrial proteostasisand 
promoteslongevity” *, Deleting baz2 or set-6enhanced theexpression 
of aset of nuclear genes encoding mitochondrial proteins, includ- 
ing some mitochondrial ribosomal proteins (Extended Data Fig. Sb). 
Indeed, we found that an imbalance between mitochondrial versus 
nuclear proteins (Extended Data Fig. 5c) correlates with UPR™ activa- 
tion (as revealed by the UPR™ reporter Pay «::GFP; ref.”*) in baz2, set-6 
and baz2;set-6mutant worms (Fig. 3i,j). This UPR™ activation was abol- 
ished by dsRNAs targeting ub/S, a positive regulator ofthe UPR™ (ref.”), 
and by dsRNAs targeting agfs-1, a transcription factor mediating the 
UPR™ (ref.”*; Fig. 3i, j). Furthermore, we found that attenuating UPR™ 
by downregulating ubl-S or atfs-1 with RNAi prevented the lifespan 
extension and elevation of pharyngeal pumping ability induced by 
baz-2or set-6 deletion (Fig. 3k, |). Thus, these two epigenetic factors 
prevent healthy ageingat leastin part via regulatingUPR™ activation. 


BAZ-2and SET-6 functions are conserved 

‘The. elegansBAZ-2hastwomammalian homologues, BAZ2AandBAZ2B, 
as does SET-6, namely EHMT1 and EMHT2 (Extended Data Fig. 7a,b).By 
analysing the gene-expression profiles of the human prefrontalcortex*”, 
we found that the expression of BAZ2B and EHMTI, butnot of BAZ2A 
and EHMT2, increased with age (Fig. 4a and Extended Data Fig. 8a, b). 
Interestingly, like their C. elegans homologues, BAZ2B and EHMT1 co- 
immunoprecipitated (Extended Data Fig. 8c). Further ChIP-qPCRanaly- 
sis showed that both BAZ2B and EHMT1 could bind toa set of nuclear 
genes encoding mitochondrial proteins in HEK293T cells (Extended 
Data Fig. 8d). In mouse primary neuronal cultures, we found that down- 
regulating Baz2b or Ehmtt, or both, enhanced the expression ofa set 
of mitochondrial proteins (Extended Data Fig. 8e and Supplementary 
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Fig.3|BAZ-2and SET-6 repressmitochondrialfunction.a, The purified 
glutathione-S-transferase (GST) fused SET domain of SET-6 proteins (SET-6s::) 
catalyses the methylation of H3K9me2/3, but not H3K9mel, in vitro. Images are 
representative of three independent experiments. Forgelsource data, see 
Supplementary Fig. 1.b, Venn diagram showingthe number of genes that bind 
toBAZ2 (red) and SET-6 (green) in ChIP-seq data. ¢, ChIP-seq profiles of BAZ2 
orSET-6 binding attwo nuclear genes, mrpl-49.and mrps7, related to 
mitochondrial function, The ChIP-seq signals are shown in the range [0-70] for 
BAZ2ChIPinputand (0-50) for SET-6 ChIP input. Scale bar, I kilobase (kb). 
Three batches of worms were collected for the ChIP-seqmeasurements.d, Top 
gene-ontology (GO) terms of 2,383 overlapping BAZ-2- and SET-6-binding 
genes (black) and 450 overlapping differentially expressed genes in baz2and 
set-6mutant worms (grey). The enrichment of genes was analysed by DAVID 
Bioinformatics Resources, and the Fisher exacttestwas used for statistical 
analyses.e, Venn diagram showing differentially expressed genes in mRNA-seq, 


Table 4). The oxygen consumption and ATP production could also be 
increased and reduced by downregulationand overexpression, respec- 
tively, of Baz2b and Ehmel (Extended Data Fig. 8f-i). Thus, Baz2b and 
Ehmt repress mitochondrial functions in mammalianneurons. 


Baz2b modulates cognitive ageing in mice 

To further explore whether the epigenetic regulator Baz2b modulates 
the behavioural deterioration of ageing mice, we constructed Baz2b- 
null (Baz2b’'-) mice by deleting four base pairs in the Baz2b gene, which 
resulted ina frameshift mutation and ablation of the Baz2b protein 
(Extended Data Fig. 9a, b). Consistently, we found that Baz2b ablation 
improved mitochondrial functioninthehippocampusandcerebellum 
of 12-month-old male mice (Extended Data Fig. 9c). Ablation of Baz2b 
also prevented age-dependent weight gain in male mice (Fig. 4b, c), 
suggesting thatit alters energy metabolism in aged mice. Furthermore, 


Days of adulthood 


data from baz 2and set-6mutant worms. f-h, ATP level (f),oxygen- 
consumption rate (g),and mitochondrial DNA (mtDNA)/nuclear DNA (nDNA) 
ratio (h) in worms with different genotypes. OE, overexpression. The numbers 
ofindependentassays are indicatedin the columns. ij, Fluorescent images (i) 
and quantitativeanalysis (j) of worms expressingthe UPR™ reporter Pup <:GFP 
inthe presence of control dsRNAs or dsRNAs targetingubl-S or atfs-. 
independentexperiments.k, Lifespan curves of worms inthe presence of 
control (left), ubl-5 (middle) or ae/s: (right) dsRNAs, Data represent the sumof 
animals in multiple experiments; the numbers ofindependent experiments 
and of tested hermaphrodites are indicatedin parentheses; two-sided log-rank 
test. , Age-dependent decline in pharyngeal pumping in the presence of 
control (left), ubl-5 (middle) or aefs: (right) dsRNAs.n=4 independent 
experiments. Inf-h,j,I, data shown aremeans+s.e.m.; one-way ANOVA with 
Dunnett's test. For all assays,“P<0.05;"*P<0.01;"""P<0.001 see 
Supplementary Information for exact Pvalues). 


although the null mutation of Baz2b did notaffect the exploratory and 
locomotive activities of mice (Extended Data Fig. 9d, e),itdid improve 
Barnes’ spatial learning (Fig. 4d) and spatial memory for newlocations 
(Fig. 4e)inold (older than 18-month) male mice. By contrast, there was 
no apparent difference in these behavioural tests among young (roughly 
three-month-old) Baz2b"", Baz2b”” and wild-type male mice (Fig. 4e 
and Extended Data Fig. 9f), andthese male miceshowed no difference 
in lifespan (Fig. 4f). Thus, Baz2b contributes to the age-related dete- 
rioration of mitochondrial function and cognitive behaviour in mice. 

Mitochondrial dysfunction has been implicated in the pathogen- 
esis of Alzheimer’s disease—an example of unhealthy ageing of the 
brain®, We found inan existing human dataset’ that BAZ2B and EHMT1 
expression in the prefrontal cortex correlate positively with the pro- 
gression of Alzheimer's disease (Extended Data Fig. 10a, b), and nega- 
tively with the expression of key mitochondrial proteins (Extended 
Data Fig. 10c-h). Given the conserved roles of BAZ2B and EHMT1in 


Nature | Vol579 | 5 March 2020 | 121 


Penenr=043,P<o000 By Wik parapet Barab 
omnis cpaon oe 

3 02) 
ra ow 
@ 00 
B01 
ped wee Young 

° -agl 

2070 somo = O20 WE eb aD The 

Yeas fears 


© ayia. Ba220" m Bazar 


Ri 


+ wide 
ao 18) 
Tawar ta 


3 months 17-19 months 


bay 
‘ ren 
Habituation Acquisition 1 Acquisition 2 ae a sali 
smi vin gos 
0 [tLe PRL | ce 
mwelagaWels fo 
Sint § 00 
el or 
t) o 
gon 
Acquisition 3 Oldmalemice Young male mice 
f 
100 <a 
F “hop 
z Satay 
Ba 


‘so WG0 656050 7.000 7 200 

‘Age (days) 
Fig. 4| Knockout of Baz2b improvesspatial learning andmemory abilitiesin 
old mice.a, Transcription levels of 84Z2B and EHMTIin the prefrontal cortex of 
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panel are from the dataset GSE1572 (see ref. );expression values (n= 145 
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data shownaremeans-+s.e.m.f, Lifespan curvesof wild-type, Baz2b" and, 
Baz2b" malemice.Ind-f, the numbersof tested miceare indicatedin 
parentheses. Pvalues were determined by:¢, one-way ANOVA with Dunnett's 
test; d, two-way repeated-measures ANOVA with Dunnett’ teste, two-tailed 
test; f, two-sided log-ranktest, 


regulating mitochondrial function, their increased expression might 
contribute to mitochondrial dysfunction in Alzheimer's disease. 
Here, by using genome-wide RNAi screening in C. elegans, we have 
provided the first global view of genes that may regulate age-related 
behavioural deterioration, and identified two repressive epigenetic 
factors~BAZ-2/BAZ2Band SET-6/EHMT1-that prevent healthy ageing. 
Notably, ablation of these factors promoteshealthy ageing by improving 
mitochondrial function and cognitive behaviour via the regulation of 
H3K9 methylation levelsat target genes (Extended Data Fig. 10i). These 
findings suggest that preventing age-related mitochondrialimpairment 
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by targeting repressive epigenetic regulatorsisa potential strategy for 
improving behavioural performance and achieving healthy ageing. 
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Methods 


Wormstrains and culture 

The wild-type Bristol N2, set-6(0k2195), TU3401 (sid -I(pk3321);uls69), 
daf-16(mu86), eat-2(adI116) and clk-I(qm30) strains were obtained 
from the Caenorhabditis Genetics Center. The baz-2(tm0235) strain 
was obtained from the National Bioresource Project, Japan. The 
set-6;baz2 double mutant worms were generated by crossing set- 
6(0k2195) with baz-2(¢m0235). The TU3401;P,,, :bas-1:-gfp strain was 
generated by crossing TU3401 with P,., ¢:bas-1:gfp transgenic worms 
(SQCOOI7). The baz:2;P,.,::bas1::gfp and set-6,Pq, :bas-1:-gfp strains 
were generated by crossing baz-2(«m0235) and set-6(0k2195) with 
Pras tbas-1:-gfp worms, respectively. The baz2,daf-16 and set-6;daf-16 
mutant worms were generated by crossing daf-16(mus6) with baz- 
2(tmO235) and set-6(0k2195) animals, respectively. The baz-2;eat-2and 
set-6;eat2 mutant worms were generated by crossing eat-2(adI116) 
with baz-2(em0235) and set-6(0k2195) animals, respectively. The locus 
of baz2is adjacent to that of clk-1, so the baz-2;clk-1 strain was gener- 
ated by deleting baz-2 (yfhO100 allele) in the genomic background 
of clk-1(qm30) mutant worms using the CRISPR-Cas9 system. The 
set-6;clk-1 worms were generated by crossing set-6(0k2195) with clk- 
1(qm30). The Pj. «::afp transgenic strain S}4100 was crossed with set- 
6(0k2195), baz-2(tm0235), and set-6;baz2 Worms to express Phup!! 
GFP in these mutant worms. All worms were cultivated at 20 °C on 
nematode growth medium (NGM) platesseeded with Escherichia coli 
OPSO unless stated otherwise. 


Mice 
The null mutation in the Baz2b gene of C57BL/6) mice was generated 
using the CRISPR-Cas9 system by the Suzhou Non-human Primate 
Facility, Institute of Neuroscience, Chinese Academy of Sciences. The 
single-guide RNAs (sgRNAs) targeting exon 2 of the Baz2b gene were 
microinjected into the cytoplasm of C57BL/6) embryos with Cas9 mes- 
senger RNA. The injected embryos were transferred into the oviducts 
of pseudopregnant mice to produce Baz2b mutant mice. The mutation 
in the Baz2b gene was confirmed by PCR and DNA sequencing. The 
sgRNA sequence was 5’-gaaactgctgaagccacgga-3’. After backcrossing 
towild-type C57BL/6) mice for three generations, Baz2b-null mice were 
identified and used, Miceused for experiments were littermates from 
crosses between Baz2b heterozygotes. All mice were housed under 
specific pathogen-free conditions on a 12-h light/dark cycle (lights 
were on from 07:00 to 19:00 every day). 


Molecular biology 

The baz2 gene with its 3,769-base-pair (bp) upstream promoter was 
amplified from N2 genomic DNA and inserted into the pPD95.75 vec- 
tor to obtain the P,,,.::baz-2::gfp plasmid. The set-6 complementary 
DNA was amplified from N2 transcripts. The set-6 cDNA and 981-bp 
set-6 promoter were inserted into the pPD95.75 vector to generate 
fp plasmid. To construct Py. x FLAG and 
x HA plasmids, we fused 3 FLAG and 3 x HA sequences 
tothe termini of baz2 DNA and set-6 cDNA, respectively. The fosmid 
(catalogue number 00754076101505996 EOS) expressing GFP-fused 
BAZ-2 was obtained from TransgeneOme”. To express SET-6ser Pro 
teins (fromthe 453th amino acid to the 708th aminoacid of SET-6), we 
amplified the set-6,,,DNA and inserted it into the pGEX-4T-1 vector 
between the BamHI and EcoRI restriction-enzyme sites. 

To express genes in mammalian cells, we inserted BAZ2B and EHMT1 
cDNAs into pC-neo and pcDNA3.1(+) vectors, respectively. Mouse 
Baz2b and Ehmel cDNAs were inserted into pKH3and c-Myc-pcs2+MT 
vectors, respectively. The small hairpin RNA (shRNA) sequences tar- 
geting Baz2b (5’-ggetctttctccaagttaa-3’) and Ehmt! (5’-gaggatagtag- 
gacttcta-3/)” or a nonsense negative control (NC) shRNA sequence 
(5’-ttctecgaacgtgtcacgt-3’) were inserted into pLKD vectors between 
the Agel and EcoRI restriction-enzyme sites, respectively. 


All plasmids were verified by DNA sequencing. We used the fol- 
lowing primers to construct the plasmids: baz:2 promoter forward, 
S-aacctgcaggaagtcctgcgacgacaag-3’ and reverse, S’-tccccccgggttttgga 
aagaatttacatg-3'; baz 2 DNA forward, 5-tecccccgggatgagtgataactca 
tctaatcag-3’ and reverse, 5/-ctcagacgecttcategttcaccggtgaca-3’; set-6 
promoter forward, 5’-acatgcatgegetettttagaatataccaac-3’ andreverse, 
S-acgegtegactttctataagcagtaaac-3’; set-6 cDNA forward, 5’-cgcggatcca 
tggaacgatctcgaactgg-3’ and reverse, 5’-ccgetcgagtgaatcttcgtcggaca 
gttc-3’; BAZ2B cDNA forward, S’-atggagtctggagaacggttacc-3/ and 
reverse, 5’-tcagctcactttgaaagtatctgtcc-3’; EHMTI cDNA forward, 
S-atggecgeegecgatgecga-3’ and reverse, 5’-tcatagggggtcggcggcage-3’; 
Baz2b cDNA forward, 5/-atggagtctggagaactgttg-3’ and reverse, 
S-tcagctcactttgaaggtatetg:3’; Ehmtl cDNA forward, 5’-ctcatttctgaaga 
ggacttgaattcaatggeegccgctgatgctga-3’ and reverse, 5/-acgactcacta 
tagttctagatcatagggggtcagcagegg-3’. 


Transgenic worms and CRISPR-Cas9 genome editing 
All transgenic worms were generated following the standard protocol” 
byinjecting the respective plasmids. Following is detailed information 
on transgenic strains: SQC0SO8 yfhix0508 (P,,,:::baz-2::gfp fosmid 
(obtained fromhttp//transgeneome.mpi-chg.de/;ref.®) atSOng pl"; 
r0l6(su1006) at 10 ng pl"); SQCOSOS yREXOSOS (P..<1:Set-6::gFp at 
50 ng pls, 44z2mCherryat1Ong pl"); SQCOS09 yfftlx0509 (Pia, 3::baz- 
2:.gfp at SO ng pl"; Pj,.4¢z:mCherry at 10 ng yl":TM0235); SQCOS1O 
fhIxXOSIO (P ..6:Set-6r-gfp at 50 ng pl" Cherry at 10 ng pl"; 
VC2683);SQCOS19 yfitxO519 (P,,,.:1baz-2::3 x FLAG at SO ng pl; Pug: 
x HAatsO ng qt"; Py, 4¢:efp at 10 ng pl"); SQC0S20 yflx0520 
x HA at SO ng pl Py c:8fp at 10ngpl”). 
CRISPR-Cas9-mediated genome editing was performed as 
described’. To construct baz-2"”*"™” and set-6""" genome-edited 
worms, we inserted GFP::FLAG and GFP::HA tags into the 3’-ends of 
thebaz-2and set-6 genes, respectively, by Cas9-triggered homologous 
recombination. The sgRNA sequences were inserted into the pDD162 
vector (Addgene). A homologous-repair template witha flexible linker, 
thegfp coding sequence and the 3 FLAG (or 3x HA) coding sequence 
were cloned into the pPD95.75 vector as the donor plasmid. The 
sgRNA (50 ng yt"), donor plasmids (10 ng pl"), and 2.5 ng uh pCF}90 
(Addgene) were co-injected into young adult N2 hermaphrodites and 
taginsertions wereconfirmed by PCRand DNA sequencing, The sgRNAs 
targeting baz-2were 5’-cagaaaaagttaaccggtta-3’ and S’-aaatcatcccatt- 
gatgttc-3’; the sgRNAs targeting ser-6 wereS’-tggcagttatgaatcttcgt-3” 
and S-tgcacttcattgetgaactt-3’. To construct baz-2(yfhO100) mutant 
worms, sgRNAs targeting baz-2were inserted into the pDD162 vector. 
The sgRNAs were 5’-ggaacatcagcatcaacgt-3’ and 5’-tttgaattatacacat- 
caaa-3'. The baz 2(yfh0100) mutanthas a1,026-bp deletion anda 165-bp 
insertion. The sequences deleted in baz-2(yfh0100) covered part of 
the first exon and the whole second exon (from the 156th to the 1 181th 
base of the baz-2genomic sequence). The insertion site was found at 
the 155th base of the baz2 genomic sequence. 


RNAi screening 
Toenhance theneuronal uptake of dsRNAs, wegenerated TU3401;Pha.s:° 
bas-1:gfp worms by crossing Pre.::bas-1::gfp transgenic worms with 
TU3401worms, which expressadsRNA channel, SID-1, inneurons”. We 
performed agenome-wide RNAi screen using TU3401;P..1:bas-1:gfp 
worms. The RNAi clones from the Ahringer C. elegans library (Source 
BioScience LifeScience) were cultured in liquid broth with 100 pgm!" 
carbenicillin overnight and then seeded on NGM plates containing 
25 wg mt" carbenicillin, 1 mM isopropyl f-D-1-thiogalactopyranoside 
(IPTG), and 20 uM2’-deoxy-5-fluorouridine (FUDR). Bacteria carrying 
the empty vector 4440 were used as the control. Approximately 30 
synchronized P,.q, ,:bas-1:gfp:TU3401 transgenic worms were trans- 
ferred to RNAiagar plates at the L4stage. The GFP intensity of NSMneu- 
ronsin the transgenic worms was examined ten days later. Weselected 
those clones that were estimated to preserve ‘high’ BAS-1::GFP levels 
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inmorethan50%of tested aged worms forre-examination inasecond 
round of testing. High’ means that the level of GFP fluorescencein aged 
worms is comparable with that in young adult worms. Only one plate 
with about30 worms was analysed in the first round of screening, and 
the fluorescence was estimated by a single investigator. In the second 
round of testing, the clones obtained from the first round of testing 
were re-examined independently by two investigators for three biologi- 
cal repeats with thesame clones. The percentage of worms with ‘high’ 
GFP fluorescence was recorded, and RNAi clones with amean score 
(average of replicates) of more than 50% were selected and re-examined 
in the third round of screening by measuring GFP fluorescence using 
a confocal microscope. RNAi clones that markedly increased BAS-1 
levels in the third round of testing were regarded as positive clones. 


HPLC 
The dopamine and serotonin levels in C. elegans were detected using 
HPLCas described’. Briefly, age-synchronized worms were cultured on 
NGM platesand harvested at day 1 or day9 of adulthood in M9 buffer. 
Then, 100 pl of packed worms were sonicated in0.3M perchloric acid 
(containing 2 mM EDTA-2Na) and centrifuged at 17,000g for 15 min 
at 4°C. Samples were filtered through a 0.22-ym filter (Millipore), 
and then a 25-1 sample was used for HPLC detection. Quantification 
of dopamine and S-HT was carried out by comparing peak areas and 
retention times with the respective peak characteristics of commercial 
standards (Sigma, USA). The protein level of each sample was quantified 
by the Bradford assay. Dopamine and 5-HT levels were normalized to 
the protein level of each sample. 


C.elegans behavioural assays 

Pharyngeal pumping rates of synchronized worms were scored on 
cultivating plates. We examined the number of pharyngeal contrac- 
tionsin 10 s using a dissection microscope as described*. 

Male mating assays were conductedas described”. About 40 synchro- 
nized L4 males were picked out and cultured on a 60-mm plate. Two 
males of each genotype and two young adult N2hermaphrodites were 
picked into a 35-mm NGM plate seeded with OPSO lawn and allowed 
tomate for 24h; the males werethen removed. Successful mating was 
defined as emergence of more than three male progeny in a mating 
plate. The mating efficiency was calculated as the percentage of plates 
with successful matingin total mating plates. 

‘We measured the basal slowing response (BSR) and enhanced slowing 
response (ESR) of wormsas described”. We incubated 60-mm NGM 
plates seeded witha ring of £. coliHB101 at37 °C for 3h, andthen allowed 
the plates to cool down to room temperature. Simultaneously, we pre- 
pared NGM plates without food. For ESR, worms were deprived of food 
for 30 min, Body bends within 20 s were scored under a dissection 
microscope. 

The lifespan of worms was examined as described, with some modi- 
fications*, A total of roughly 90 synchronized worms were seeded on 
three OPSO plates at 20 °C. We assessed the number of dead worms 
and picked them out daily. 

Wecarried out RNAi ofthe ubl-Sand atfs-I genesas described’. Activa- 
tion of UPR™ in youngadult worms was reported by Pjxy.<:GFP fluores- 
cence. Pharyngeal pumping and lifespan were examined using plates 
seeded with ubl-S, ayfs-1 or control dsRNAs throughout the experiment. 
We examined the pharyngeal pumping of worms treated with RNAi 
for the genes identifiedin the screening hits in the presence of 20 1M 
FUDR. The empty vector L4440 was used as acontrol. 


Stress assays 

Hydrogen peroxide stress assay. About 20 synchronized wormsat day 
Sof adulthood were transferred to each well, which contained 800 tl 
‘wormS-basal buffer with various concentrations of H,0, ina12-well plate 
at 20°C. Four hours later, 200 units of catalase (Sigma, C9322) were 
added to neutralize the H,O,, and the mortality of worms was scored. 


Ultraviolet stress assay. About 30 synchronizedhermaphroditesatday 
S of adulthood were exposed to 1,600 J m” ultraviolet light in an NGM 
plate withoutbacteria. After recovery, worms were culturedin standard 
condition, andthemortality of ultraviolet-treatedwormswasscored daily. 


Heat-shockassay. About 30 synchronized hermaphroditesatday Sof adult- 
hood were incubated at 35 °C, and their mortality was checked every2h. 


Immunoprecipitation and western blotting 
For co-immunoprecipitation of BAZ-2and SET-6, mixed stages of trans- 
genicworms or genome-edited worms were harvested and washed for 
three times with M9 buffer. Samples were frozen in liquid nitrogen, and 
groundto fine powder usinga pestleand mortar. The worm lysate was 
sonicated in lysis buffer containing SO mM HEPES pH7.4,1 mM EGTA, 
1mMMgCI,, 150 mM KCI, 10% glycerol, mM NaF, 1mM phenylmethy!- 
sulfonyl fluoride (PMSF), and complete protease-inhibitor cocktail 
(Roche 11836145001). After treatment with NP-40 ata final concentra- 
tion of 0.05%, sampleswere lysed at 4 °C for hand then centrifuged at 
20,000g for 15 min at 4°C to remove the debris. The supernatant was 
incubated with anti-FLAG antibodies (conjugated to agarose beads) 
overnightat4°C. Afterincubation, the agarose beads were washed with 
lysis buffer and boiledinSDS sample buffer (2% SDS, 10% glycerol, S0mM 
Tris-HCI, pH 6.8, 0.01% bromophenol blue, 5% B-mercaptoethanol) for 
10 min, Samples were separated on SDS-PAGE gels and the standard 
western blot procedure was carried out. The primary antibodies used 
were anti-FLAG (Abmart, M2008) and anti-HA (Roche, 11867431001). 
For co-immunoprecipitation of BAZ2B and EHMT1, HEK293T cells 
stably expressing BAZ2B-FLAG were transfected with 3 x HA-EHMT1 
plasmids. Cells were lysed with ice-cold NEIO buffer (20 mM HEPES 
pH7.5, 10 mMKCI, 1mM MgCl, 0.1% Triton X-100) and centrifuged at 
1,000g for 10 min at 4 °C; nuclear proteins were then extracted with 
NE1SObuffer (20 mMHEPES pH 7.5,10mMKCl, 1mM MgCl,,0.1% Triton 
X-100,150 mM NaCl). Lysate was cleared by centrifugation at 20,000 
for20minat4°C. The supernatant was diluted with the same volume of 
dilution buffer (20 mM HEPES pH7.5, 150 mM NaCl, 20% glycerol, 0.1% 
Triton X-100, 0.4 mM EDTA) and then incubated with anti-FLAG anti- 
bodies conjugated toagarose beads for 4 hat 4 °C for further analysis. 
For immunostaining of H3K9 methylation, worms at synchronized 
day 2 of adulthood, cultured on NGM plates containing 20 1MFUDR, 
were harvested and washed several times in M9 buffer, then frozen 
in liquid nitrogen. Worm samples were boiled in SDS sample buffer 
for 15 min, separated on SDS-PAGE gels, and analysed by standard 
western blotting, Primary antibodies used were anti-H3K9mel (Abcam 
ab9045), anti-H3K9me2 (Abcam ab115159), anti-H3K9me3 (Abcam 
ab8898) and anti-H3 (Abcam ab1791). Second antibodies were IRDye 
800CW goat anti-rabbit IgG (LI-COR P/N 925-3211) and IRDye 680RD 
goat anti-rabbit IgG (LI-COR P/N 925-68071). The stained membranes 
were scanned using the Odyssey CLx imaging system to measure fluo- 
rescence at 800 nm and 700 nm. The normalized H3K9 methylation 
levels were calculated by normalizing the ratio of H3K9 methylation 
and histone3 levels to that of N2 worms. 


Invitro histone methyltransferase assay 
Weincubated 1-2 1g of GST-fused SET-6ser proteins with 1-4 yg of his- 
tone proteins (LS002544, Worthington) and S-adenosyl-L-(methyl-3H) 
methionine (SAM; sigma) ina mixture of 20 1! methylaseactivity buffer 
(50 mM Tris-HCI, pH 8.0, 10% glycerol, 20 mM KCI, SmM MgCl, mM 
dithiothreitol (DTT) and 1 mM PMSF) for 16 hat 20 °C. We then added 
SDS loading buffer, and boiled the samples for 15 min, Proteins were 
resolved on a15% SDS-PAGE gel and visualized by western blotting. 


Chromatin immunoprecipitation 
For ChIP assays, chromatin immunoprecipitation was performed as 
described™ with some modifications. Briefly, young adult or aged 


‘worms were harvested and washed several timesin M9 buffer, and then 
washed twice in phosphate-buffered saline (PBS). Worms were lysed in 
crosslinking buffer (1% formaldehyde in PBS with proteinase-inhibitor 
cocktail) usinga glass dounce homogenizer onice, and fixed ona Nuta- 
torshaker for 15 minat 37 °C. Fixed worm samples were quenched with 
0.125 M glycine and washed in cold PBS buffer with protease inhibi- 
tors three times. Then samples were resuspended inice-cold FA buffer 
(50 mM HEPES-KOH pH 7.5, 180 mM NaCl, 1mM EDTA, 0.1% sodium 
deoxycholate, 1% Triton X-100, 1 mM PMSF and protease-inhibitor 
cocktail) with 0.1% SDS, and sonicated using a Bioruptor sonication 
system (Diagenode UCD-200) at high amplitude for 10-15 cycles of 
30s onand 30 soff. Thesonicated samples were centrifuged at16,000g 
for 15 min at 4°C. Then, samples were precleaned with protein A/G 
agarose beads and immunoprecipitated overnight using GFP-trap 
agarose beads (Chromotec ACT-CM-GFA0250) at 4 °C. After proteinase 
K digestion and reverse crosslinking, the precipitated DNA and input 
DNA were purified using phenol/chloroform, precipitated with etha- 
nol and subjected to DNA library construction or ChIP-qPCR assays. 

For ChIP-seq analysis of histone H3K9 methylation, chromatin was 
digested using 6 pil of micrococcal nuclease (MNase) (CST catalogue 
number 1001S) in 400 pl of buffer B (CST catalogue number 7007) 
containing 0.5 mM DTT for 20 minat37 °C, and then was stopped with 
25mMEDTA. About 10 ig of the digested chromatin wasused forimmu- 
noprecipitation, and 5% of the extract was saved as the input sample. 
‘The antibodies used were H3K9mel (Abcam ab9045), H3K9me2 (Abcam 
ab1220) and H3K9me3 (Abcam ab8898). 

For ChIP-seq library preparation, 20 ng purified DNA samples were 
used. Library construction was performed using the QlAseq Ultralow 
Input Library Kite (QIAGEN 180495) according to the manufacturer's 
recommendations. DNA libraries were sequenced on an Illumina HiSeq 
X-ten instrument, with 150-bp paired-end sequencing. Three batches 
of worms were collected for each ChIP-seq measurement. 

For ChiP-PCR analysis of genes bound to BAZ2B or EHMT1 in 
HEK293T cells, immunoprecipitation was performed using anti-IgG, 
anti-FLAG (Sigma, F316S) or anti-EHMT1 (R&D SYSTEMS, PP-B042200) 
antibodies. mmunoprecipitated DNA was purified using phenol/chlo- 
roform extraction and thenused for qPCR analysis. Primer sequences 
for ChIP-qPCRare listed in the Supplementary Information. 


ChIP-seq data analysis 

Raw reads were filtered using cutadapt (version 1.15)" to obtain clean 
reads with the following parameters: -q 20,20 -m 18-a-A-o -p.Clean 
reads werealigned to the reference worm genome (WormBase, https:// 
wormbase.org, version WS266) using Bowtie2 aligner (version 2.3.3.1) 
with default parameters. ChIP-seq peaks were called using MACS soft- 
ware (version2.1.0.20150731)”. For BAZ-2and SET-6 ChiP-seq datasets, 
MACS2 was used to call the narrow peaks with the default parameters. 
Allcalled peaks were annotated using the HOMER (version 4.9, 2-20- 
2017)**annotatePeaks.p! function with custom annotation mode (using 
the WS266 genome sequence and aGTF file). The aligned sequence- 
alignment map (SAM) files were converted into binary alignment map 
(BAM) files and then indexed using samtools software (version 1.5)”” 
with ‘samtools view-@10 -bS -q30’and’samtoolsindex-@10' param- 
eters. To obtain read signal tracks for the Integrative Genomics Viewer 
(IGV, Broad Institute), we converted indexed BAM filesinto bigwig files 
using the reads per kilobase of transcript per million mapped reads 
(RPKM) normalization method with deeptools (version2.5.4)®, using 
‘bamCoverage -b-o-normalizeUsingRPKM -numberOfProcessors= 
extendReads 200-binSize=10' parameters. Bedtools (version2.26.0)* 
with ‘bedtools intersect -a-b-wa-wb-e-f 0.5-F 0.5’ parameters were 
used to obtain the overlapping peaks of BAZ-2 and SET-6. Genes with 
enriched ChiP-seq reads were used to carry out functional-enrichment 
analysis on the DAVID website (https://david.ncifcrf. gov, version 6.8)", 
using the functional module. Enriched items were represented asa bar 
plotusing the seaborn package (version 0.8.1) of Python. Anin-house 


Python scripts integrated by the metaseq (version 0.5.5.4)® framework 
was used to generate heatmap plots of the differentially expressed 
genes (obtained by RNA sequencing). The differentially expressed 
genes were listed in the same order for heatmaps of both mRNA-seq 
data and BAZ-2/SET-6 ChiP-seq data. 


RNA sequencing 
Worms were synchronously cultured on NGM plates, and transferred 
toNGM plates containing 20 uM FUDR. Worms were harvested at day 
2of adulthood, washed several times using M9 buffer, and then lysed 
with TRIzol reagent (Invitrogen). Total RNA was extracted using the 
RNeasy Mini Kit (QIAGEN). RNA quality was assessed with the Agilent 
2100 bioanalyser system, and samples with an RNA intensity (RIN) 
above 8.0 were used to construct the library. Then, mRNAs were puri- 
fied with oligo(dT) magnetic beads and cDNAs were synthesized using 
oligo(dT) primers. Bead-bound cDNAs were digested with the restric- 
tion enzyme Nlalll and ligated with the Illumina adaptor Latthe sticky 
Send. Bead-bound cDNA fragments were then digested with Mmel and 
ligated with lumina adaptor 2 at the 3’-ends. After linear PCRamplifi- 
cation, products were purified by PAGE electrophoresis. Sequencing 
was performed with an Illumina HiSeq 2000 sequencer. Two batches 
of worms were collected for mRNA sequencing. 


RNA-seq analysis 

Differentially expressed genes were defined with the following cri- 
teria: upregulated genes (false discovery rate (FDR) less than 0.001, 
log,-transformed fold change greater than 1, transcripts per million 
(TPM) oc¢rhne2reater than 5); downregulated genes (FDRlessthan 0.001, 
log-transformed fold changelessthan -1, and TPMy.greaterthans);and 
other genesrepresenting the onesthatarenot differentially expressed. 
Weused theseaborn packages of Python to plot scatter figures. Differ- 
entially expressed genes of each sample (baz2and set-6 mutant worms) 
werescreened out and then their log,-transformed fold change values 
were collected together to generatea pandas DataFrame for further use. 
Heatmapswith five k-means clusters were generatedusing theseaborn 
clustermap function. Each gene cluster from the heatmap was used to 
analyse GeneOntology (GO) function enrichmentin the DAVID website. 


qPCR formRNA and mtDNA quantification 

We collected synchronized young adult or aged worms and washed off 
bacteria with M9 buffer. Total RNAs were extracted and cDNAs were 
generated using the QuantiTect reverse transcription kit (Qiagen, 
catalogue number 205314). RT-PCR reactions were conducted using 
the SYBR Premix Ex Taq kit (Takara, RR420A) and a Light Cycler 480 
(Roche). To quantify mitochondrial DNA (mtDNA), we lysed 20 young 
adult worms in 40 yl worm PCR lysis buffer (50 mM KCI, 10 mM Tris 
pH83,2.5mMMgCl,,0.45%NP-40, 0.45% Tween-20, 0.01% gelatin, with 
freshly added 0.1 mgmt" proteinase K). DNAs were released by heating 
the wormsina PCR cycler at 65 °C for 90 min and 95 °C for 15 min. The 
wormlysate was diluted 50 times with nuclease-free water and 5 ul of 
the sample was used as the template for RT-PCR. The ratio of values 
foramitochondrial gene, nd-1, anda nuclear gene, act:3, isused asthe 
relative level of mtDNA per nuclear genome. Primers used for RT-PCR 
are listed in Supplementary information. 


Measurement of ATP and oxygen consumption 

To quantify worm ATP levels, we collected about 150 young adult or 
aged worms, washed them three timesto remove bacteria, and resus- 
pended them in 100 pl M9 buffer. After five freeze/thaw cycles (from 
liquid nitrogen to 40°C water), worm pellets were boiled for 25 min. 
Samples were cooled on ice and centrifuged at 11,000gfor 10 min at 
4°C. The supernatant was carefully transferred to a new tube. The 
samples were diluted four times with double-distilled water and then 
subjected to ATP measurement. The results were normalized to the 
protein level of each sample. 
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‘Tomeasure the oxygen-consumption rates (OCRs) in C. elegans, we 
rinsed off about 200 synchronized young adult and aged worms and 
washed them three times with M9 buffer by gravity separation. The 
worm pellets were divided into five replicated tubescontaining1 mI M9 
buffer. After standing for 30 min, the samples were pipetted into wells 
of Seahorse assay plates, and 1 mI M9 buffer was placed into the blank 
well. Oxygen consumption was measured ten times at 22°C with the 
following protocol: 1, calibrate probes; 2, loop 10 times; 3, mix 2min; 
4, time delay 2 min; 5, measure 2 min; 6, loop end. The OCR value was 
normalized to the number of worms per well. 

‘Tomeasure ATPlevelsin mammalian cells, welysed primary cultured 
neurons in 10 mM Tris-HCl (pH 7.6), 1 mM EDTA and 0.5% Triton X-100 
buffer for 20 min at 4 °C. Samples were then centrifuged at 20,000¢ 
forl0 minat4°C, and the supernatants were diluted 20 times withlysis 
buffer. ATP levels ofthe samples weremeasured using abioluminescence 
detection kit (Promega G7570). The ATP level was normalized to the 
protein level of each sample. OCRs of primary neurons were measured 
using the Seahorse XF24 analyzer (Agilent). The culture medium was 
changed toXF base mediumcontaining 10 mMglucose, 2mM glutamax, 
1mM sodium pyruvate and 2% B27, pH7.4, and then cells were cultured 
in a37°C incubator without CO, for1h. After incubation, OCRs were 
measured and the following respiratory poisons were sequentially added 
totheassay plateat final concentrations of: oligomycinA, 11M (Selleck- 
chem $1478); FCCP, 4M (Sigma C2920); rotenone, 11M (Sigma R875). 
The results were normalized to the total proteinamountin each well. 

To measure mitochondrial OCRs, we isolated mitochondria from 
different brain regions of Baz2b", Baz2b’” and wild-type male mice 
as described". Mitochondrial OCRs were measured using a Seahorse 
XF24 analyser. We plated 10 yg mitochondria in 50 il mitochondrial 
assay solution (MAS) buffer (70 mM sucrose, 220 mM mannitol, 5mM 
KH,PO,, 5 mM MgCl,, 2 mM HEPES, ImM EGTA, 0.2% fatty-acid BSA, 
pH 7.2) in each well of an XF24 plate, and added MAS buffer without 
mitochondria n four wellsfor background correction. We centrifuged 
mitochondria at 2,000gat 4 °C for 10 min and then added 450 yl MAS 
buffer containingsubstrates (mM succinate, SmM malate, SmM gluta- 
mate, SmM pyruvate) to each well, Mitochondrial OCRs weremeasured 
bysequentially adding 0.6mM ADP, 18 uM oligomycin, 9 MFCCP and 
5M rotenone together with 2 uM antimycin to the assay plate. 


Primary neuron culture 

Primary cortical and cerebellar neurons were prepared from cerebral 
and cerebellar tissues of mice at embryonic day 14.5 (E14.5) and postna- 
tal day 6 (P6), respectively, as described****, Baz2b and Ehme! plasmids 
wereelectroporated into primary cortical neuronsusing Nucleofector 
(Lonza) in the presence of P3 primary cell 4D-Nucleofector solution 
(Lonza, V4XP-3024). The transfected neuronswereseeded into plastic 
culture dishes, coated with poly-b-lysine (Sigma, P7280) and laminin 
(Invitrogen 23017015), ata density of 200,000 cells per square centi- 
metre. The neurons weremaintainedin serum-free neurobasalmedium 
(Invitrogen, 21103049) supplemented with B-27 (Invitrogen, 17504044), 
2mM glutamax- (Invitrogen, 35050061), and penicillin/streptomycin 
(Hyclone, SV30010). The electroporated neurons wereassayed at day 
invitro (DIV) 20r3. For shRNA-mediated downregulation of Baz2band 
Ehm¢1, primary cerebellar neurons at DIV3 were infected with lentivirus 
carrying Baz2b or Ehmel shRNA sequence and cultured in serum-free 
NeuroBasal-A medium (Thermo, 10888022), 2% B27 supplements, 1% 
glutamax-I, 1% penicillin/streptomycin and 250 M KCL. Transfected 
neurons were harvested at DIV8 for further experiments. 


Mouse behavioural assays 

Young (3-month-old) and old (more than 18-month-old) maleBaz2b"-, 
Baz2b*’ mice and wild-type littermates were used in all experiments. 
The ages of old Baz2b"', Baz2b" and wild-type mice were 21.1 + 3.3, 
21.2 £29 and 21.14 3.4 months, respectively. Animals were handled 
onceaday for one week before behavioural tests. All procedures were 


approved by the Animal Care and Use Committee of the Institute of 
Neuroscience, Chinese Academy of Sciences, Shanghai, China. 


Openfieldtest 

The open field test was performedas described”. Mice were placed in 
the centre of a polystyrene box (40 cmx 40cm x40 cm) and the behav- 
ioural activity of each mouse was recorded for 1S min using the EthoVi- 
sion video tracking system (Noldus, Wageningen, The Netherlands). 


Barnes maze test 

A modified Barnes maze test was performed as described", using an 
opaque polystyrene disc of 120 cm in diameter. The maze contained 
40 holes, anda black polystyrene escape box was placed under one of 
these holes. Distinct visual cues around the maze were used through 
out the study, and an overhead light was used as anaversive stimulus. 
Mice were trained for four consecutive days. Four trialsatintervals of 
15 min were performed each day. At the beginning of training on the 
first day, mice were gently guided by the experimenter to the escape 
hole and were then covered with a black box for 2 min. In each train- 
ing trial, mice were given 3 min to find the escape hole. If they found 
the escape hole within 3 min, the exact time they spent in finding the 
hole was scored as the escape latency. If they failed to find the escape 
hole within 3 min, they were gently guided towards the hole and the 
allotted investigation time was regardedas the escape latency. Before 
eachtrial, the escape box and themaze were cleaned with75% alcohol. 


Novel-location recognition 

Anovel-location-recognition test was performedas described”. Mice 
were placed in the trainingarena (24cm 24 em 24cm) for one 10-min 
session without objects and for another three 10-min sessions with 
two distinct objects. During the intersession interval of 3 min, mice 
were returned to their home cages. The objects used were a conical 
flask (height 8 cm; depth 4 cm) anda toy brick (8 cm x 4m x 4.cm). 
Before each session, the arena and the objects were cleaned with 75% 
alcohol. Twenty-four hours after training, mice were tested for 10min 
in the originalarena, inwhich one of objects (displaced object, DO) was 
displaced to a new place and the other object (nondisplaced object, 
NDO) was not moved. During training and testing, exploration was 
recorded with a digital cameraand scored using software described in 
ref. byan experimenter who was blind to genotypes. Explorationwas 
defined as sniffing or touching the objects with thenose (butnotclimb- 
ing on, turning around or biting objects). The discrimination index is 
calculatedas follows: (time exploring the DO minus time exploringthe 
NDO), divided by (time exploring the DO plus time exploring the NDO). 


Lifespan assay 

Anexperienced technician checked the health condition of mice daily 
and estimated whether ahumane endpoint hadbeen reached. Amouse 
had reached a humane endpoint whenit showed more than one of the 
following conditions: (1) serious trauma; (2) no eatingand drinking for 
more than 24 h; (3) no response to gentle disturbance that lasted for 
along time; and (4) arapid loss of more than 20% body weight. Some 
mice were found dead in their home cage at the daily inspection. 


Data reporting and statistics 

No statistical methods were used to predetermine sample size. The 
sample sizes in our experiments were determined from related pub- 
lished analyses. The experiments were not randomized. All C. elegans 
strains were synchronized and cultivated at20 °C. TheRNAiscreen and 
the behavioural, lifespan and stress assays were performed at room 
temperature (roughly 20 °C). Behavioural and lifespan experiments 
were repeated at least three times and investigators were blinded to 
the genotypes or dsRNA treatments. Worms were picked at L4 stage 
and raised to arange of ages, chosenin an unbiased manner forbehav- 
ioural, ifespanand imaging assays. In lifespan and male mating assays, 


‘worms that crawled to the wall of the platewere not includedin the data. 
HEK293T cell lines (catalogue number SCSPS02) were ordered from 
the cell bank of the Chinese Academy of Sciences. All procedures for 
culturingmouse primary neurons and performing mouse behavioural 
tests were approved by the Animal Care and Use Committee of the 
Institute of Neuroscience, Chinese Academy of Sciences, Shanghai, 
China. The investigators were blinded to genotypes during mouse 
behavioural assays. For the novel-location recognition test, mice that 
did not explore for more than3secondsin total for both objects during 
training or testing were excluded from the analysis. Animals that had 
discriminationindexes of more than 0.2 or less than -0.2 duringtrain- 
ing were considered to havea significant location/object bias during 
training and were also excluded from further analysis. 

Weused GraphPad Prism 7 (GraphPad Software, Inc.) for statistical 
analyses. We tested the normality of the data with the Shapiro-Wilknor- 
mality test. Weused the Brown-Forsythe test to examine differencesin 
variancebetween groups. We used: two-tailed Student’s¢-testto analyse 
differences between two groups; one-way analysis of variance (ANOVA) 
followed Dunnett’scorrection test to analyse differences between mul- 
tiple groups; two-way ANOVA to analyse differences between multiple 
groups with two variations; and a two-sided log-rank (Mantel-Cox) 
test to analyse lifespan statistics. The variance of all plots and graphs 
is represented asmeans:s.e.m.;nrefersto thenumber of worms, mice 
or independent experiments. The significance of statistical differences 
is indicated as: *P<0.05;"*P<0.01;**P<0.001. 

Weused two publicly available microarray datasets, GSE1S72 (ref.) 
and GSE44772 (ref.”), to analyse changes in gene expression between 
normal and pathological ageing of human brains. All brainsamplesin 
the dataset GSE1S72 were used for analysis. In Fig. 4a (right panel) and 
Extended Data Fig. 8b, control brain samples without neurodegenera- 
tive disease in the dataset GSE44772 are regarded as normal ageing 
brains; liquid-nitrogen-preserved samples withRNA intensity numbers 
(RINs) of S or moreare used for correlation analysis. Weused samples 
from ageing brains with Alzheimer’s disease from dataset GSE44772 for 
the correlation analysis shownin Extended Data Fig.10a-h; we removed 
samples that had RINs of less than 5, were from people younger than 
50, were affected by Huntington's disease, or had been preserved on 
dry ice. Pearson's correlation was performed using GraphPad Prism7. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

‘The raw sequence data generated here have been deposited in the 
National Center for Biotechnology Information (NCBI) Sequence Read 
Archive (https://www.ncbi.nlm.nih.gov/sra) under accession number 
PRJNASS4977. Source Data for Figs. 1-4 and Extended Data Figs. 1-10 
are provided with the paper. 


Codeavailability 


Allcustomcodeused to generate figuresis available at https://github. 
com/SHYKON-YIN/nature. 
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Extended Data Fig. 1| Categories and coexpression networkof genome-wide 
RNAiscreeninghits.a, Effects of representative positive cloneson BAS-1 
protein levels in P,q..:bas-L:gfp transgenic worms. White arrowsindicate NSM 
neurons. Scale bar, 20 jim. Images are representative of three independent 
experiments. b, Effect of each screening hiton the expression of BAS-1::GFP. 
The GFP fluorescence in NSM neurons of worms individually treated withthe 
59 screening hits, also shown in Supplementary Table 1, was normalized to that 
inworms treated with the control RNAi. Pvalues were determined by unpaired 
two-tailed ctest (see Supplementary Table 1 for numbersof tested wormsand 
exact Pvalues).e, Categories of genes corresponding to the59 screening hits 
identified by RNAi. d, Coexpressionnetworkof screening hitsand their 


partners. The network was constructed with GeneMANIA. Black and grey dots 
indicate screening hitsand their partners, respectively. Blueand orange dots 
show screening hits whose human homologuesare involved inage-related 
neurodegeneration and cell senescence, respectively.e, Pharyngeal pumping 
in TU3401worms (whichexpressa dsRNA channel, SID-1, in neurons) after RNAi 
treatment in the presence of 20 yM2-deoxy-S-fluorouridine (FUDR). The 
genes fromthe top 20screening hits were examined, Numbers of tested worms 
are shown beneath thebars, Data shown are means +s.e.m.;*P<0.05;**P<0.05; 
“P< 0.001; ns, not significant (one-way ANOVA with Dunnett's test: see 
Supplementary Information for exact Pvalues). 
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Extended DataFig.2|BAZ-2and SET-6 regulateage-related declineinBAS-1 
expression.a, Expression patterns of BAZ2and SET-6. Scale bar, 30 ym. 
Imagesare representative of three independent experiments. 

b, Representative fluorescence images (left) and quantitative analysis (right) 
of BAZ2 and SET-6 expression. We quantified only those cells whose nuclear 
morphology was clearly visualized with DAPI. Scale bar,10 um.n=14and18 
worms for BAZ2 and SET-6, respectively. c,d, Representative western blots (c) 
and quantitative analysis (d) of age-related changesin endogenous SET-6and 
BAZ2 protein levelsin genome-edited baz 2"""“:set-6"" worms. n=5 
independent experiments. Tubulin expression isusedasa reference. For gel 


Nuclearitotal fluorescence (%) 


25 “ 5 
3g 20 3 34 : 
- 15 ce 23 

§ 2 3 . 
2 10 Bal 3 
8 os g4 

is 
E oo Bo 
= 16 9 138 9 


Days of adulthood Days of adulthood 


0 Day2 
Day 10 


112 12:19 1111 


source data, see Supplementary Fig. . Ind, each data point representsthe 
resultof one independent experiment. e, Fluorescence images (left) and 
quantitative analysis (right) of BAS-1 expression in baz2and set: 6mutant 
worms, Quantitative analysis of BAS-Llevels was performed by measuring GFP 
fluorescence intensity in the soma of NSM neurons. White arrowsindicate NSM 
neurons. Scale bar, 15 jim, The numbers of tested worms areshown beneath the 
bars. Alldata shown aremeans#s.e.m,*P<0.05;"*P<0.01;***P<0.001 

(d, Kruskal-Wallis test;e, one-way ANOVA with Dunnett's test;see 
Supplementary Information for exact Pvalues). 
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Extended Data Fig.3| Deletion of baz-2or set-6 extends lifespan and 
enhancesstress resistance viamechanisms related todietary restriction 
and mitochondrial function. a-c, Percentage survival ofN2, baz2, ser-6,and 
baz2;set-6 worms under oxidative (a), ultraviolet (b) and heat (e)stress. 

d Lifespan curves of daf'16, daf-16;baz2, and daf-16;set-6 mutant worms (eft) 
and their abilities to resistto oxidative stress (right).e, Lifespan curvesof eat2, 
eat2;baz2, and eat2;set-6 mutant worms (left) and theirabilitiesto resistto 
oxidative tress (right).f, Lifespan curvesof clk-1,clk-L:baz2, and.clk-;set-6 


mutant worms (left) and their abilities to resist to oxidative stress (right).Ina-f, 
for oxidative-stressassays, data shown aremeans-+ s.e.m.; one-way ANOVA 
with Dunnett’s test; numbers of independent experimentsareshown beneath 
the bars. Forheatshock, ultravioletstress and lifespan assays, data represent 
thesum of animals in multiple experiments; two-sided log-rank test. The 
numbers of independent experimentsand of testedhermaphroditesare 
indicatedin parentheses. Inall assays, “P<0.05;"*P<0.01;***P<0.001; ns, not 
significant (see Supplementary Information for exact Pvalues). 
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Extended Data Fig. 4 |Epigeneticregulators BAZ-2andSET-6 localize at the 
promoter region of target genes.a, Co-immunoprecipitation of BAZ-2and 
SET-6 using genome-edited baz-29"™";set-67" worms (left) or transgenic 
worms expressing BAZ-2::FLAG and SET-6::HA (right).Imagesare 
representative of four independent experiments, For gel sourcedata, see 
Supplementary Fig. 3. b, Representative western blots (left) and quantitative 
analysis (right) of H3K9 methylation levels in N2, baz-2, set-6 and baz-2,set-6 
‘worms, Normalized H3K9 methylation levels were calculated by normalizing 
the ratio of H3K9 methylation and histone 3 levels to that of N2 worms. For gel 
source data, see Supplementary Fig. 4.¢, Peaks of BAZ-2- and SET-6-binding 
sites in the region -1,000 bpto +1000 bp around the transcription startsite 
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(TSS). Only those peaks with afold change of more than 2are plotted. They-axis 
indicates the average read coverage normalized to the number of uniquely 
mapped reads per million pergenomicbin (bins=1,000).d, Pie chartshowing 
the distribution of overlapping BAZ-2- and SET-6-bindingsitesin genomic 
features. TTS, transcription terminationsite.e, f, ChIP-qPCR analysis of 
endogenousBAZ2 (e) or SET-6(f) enrichmentat nuclear genes encoding 
mitochondrial proteins in genome-edited baz2"*"“and ser6""" worms. 


ChIP-qPCR data from N2worms were used asa control. Inb, 


thenumbersof 


independentexperimentsareshown beneath the bars;dataaremeans#s.e.m.; 


*P<0.05;"P<0.01; 


<0.001(b, one-way ANOVA with Dunnett’stest; 


€.f, two-tailed ¢test;see Supplementary Information for exact Pvalues), 
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Extended Data Fig. 5| BAZ-2and SET-6 regulate the expression of nuclear 
genes encoding mitochondrial proteins. a, Scatter plots of mRNA:seq data 
for N2 versus baz-2mutant worms (left) andN2 versus ser-6 mutant worms 
(right), The.x-andy-axes represent the loglO-transformed transcripts per 
million clean tags (TPM) expression values of N2 (x-axis), baz-2(left, y-axis) and 
set-6 (right, y-axis) animals. Differentially expressed genes (DEGs) were defined 
through the parameters of false discovery rate (FDR) <0.001 and jlog,Ratio| >1. 
b, RT-PCR analysis of changesin the expression of nuclear genes encoding 
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mitochondrial proteins in baz-2and set-6 mutant wormsat day1 (left) or day7 
(right) of adulthood. The numbers of independent experimentsareshown 
beneath the bars. , Western bots (Ieft)and quantitative analysis (right) 
ofnDNA-encoded ATPSAand mtDNA-encoded MTCO1 proteins. n=3 
independentexperiments, For gel source data, see Supplementary Fig. 5. 
Inb,¢, data shown aremeans +5.e.m.;*P<0.05;"P<0.01;*"*P<0.001;ns,not 
significant (one-way ANOVA with Dunnett’stest;see Supplementary 
Information for exact Pvalues).. 
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Extended Data Fig. 7| Human homologues of BAZ2 and SET-6.a, Diagram MBD, methyl-CpG binding domain; PHD, plant homeodomain; SET, Su(var.)3-9, 
showing the similarity between C. elegans BAZ-2and SET-6 andtheir human Enhancer-of zeste, Trithorax domain; b, Alignment of conserved domains in 
homologues. aa, aminoacids; ANK,ankyrin repeats;BROMO, bromo domain; _ BAZ2or SET-6 withthosein their mammalian homologues. identical and 
DDT, ‘DNA-binding homeobox and different transcription factors’ domain; conservative residues are highlightedinblack and in grey, respectively. 
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Extended Data Fig.8 |MammalianBaz2b and Ehmt1haveaconserved rolein 
repressing mitochondrial function. a,b, Transcription levels of B4Z24 and 
FHMT2inthe prefrontal cortex of human brainsat differentages. Expression 
valuesin.a(n=30 samples) are from the dataset GSE1S72 (seeref."),andinb 
(n=145 samples)are from brain samples without neurodegenerative disease in 
the dataset GSE44772 (see ref.”). Pearson's rcorrelation coefficient was used 
for statistical esting. c, Co-immunoprecipitation of BAZ2B and EHMT1in 
HEK293T cells. Imagesare representative of threeindependent experiments. 
Forgel source data, see Supplementary Fig. 6. , ChIP-qPCR analysis of BAZ2B 
orEHMT enrichment at nuclear genesencoding mitochondrial proteins in 
HEK293T cells. Immunoglobulin G (IgG) antibody was usedasa control. 
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e-, Effects of downregulation of mouse Baz2b or EhmtIby shorthairpin RNAS 
(shRNAs) on the transcription of mitochondria-related genes (e), ATP level (f) 
and oxygen-consumption rate (g) in primary mouse cerebellar neurons. F, 
carbonyl cyanide 4-(trifluoromethoxy) phenylhydrazone (FCCP);NC, negative 
control; 0, oligomycin;R, rotenone. h,i, Oxygen-consumption rate (h) and ATP 
content (i) in primary mouse cortical neurons overexpressing (OE) mouse 
Baz2b or Ehmtl. Graphical OCR dataare representative of three independent 
experiments. Ind-i, sample numbersareshown beneath thebars. Datashown 
aremeans+s.e.m. “P<0.01;*"*P<0.001 (d, two-tailed etest, 

e-h, one-way ANOVA with Dunnett’ test: Kruskal-Wallistest;see 
Supplementary Information for exact Pvalues). 
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Extended Data Fig. 9 Effects of deleting Baz2b on mitochondrialfunction 
and mouse behaviour. a, Diagram showingthe generation of Baz2b™~ (KO) 
mice. UTR, untranslated repeat; WT, wild-type. Protospacer-adjacent motif 
(PAM) sequences are highlighted ingreen; the substitutionsite is highlighted 
inred.b, Western blot analysis of Baz2b protein level in samples from WT, 
Baz2b"* and Baz2b* mice. Images arerepresentative of threeindependent 
experiments, For gel source data, see Supplementary Fig. 7.¢, Oxygen- 


consumption rates (OCRs) of mitochondria isolated from 12-month-old WT, 


Dayi Day2 Day3— Days 


Baz2b"" and Baz2b~ malemice. A, antimycin; , carbonyl cyanide 
4-(crifluoromethoxy) phenylhydrazone (FCCP); O, oligomycin;R, rotenone. 

n= 4mice per group. Pvalueswere determined by one-way ANOVA with 
Dunnett's test.d, e, Spontaneouslocomotion of old (d) and young (e)miceinan 
openfield test, The numbers of tested mice are shownbeneath the bars. 
£,Escapelatency in Barnes maze trials during training days for young WT, 
Baz2b*- and Baz2b mice. Numbers of tested mice are indicatedin 
parentheses. Inallassays, data shownaremeans+s.e.m. 
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Extended Data Fig.10 | Expression level of BAZ2Band EHMTI correlates 
positively with the progression of Alzheimer’s diseasein humans. 

a, Expression level of BAZ28 (left) and EHMT1 (right) at different Brak stagesin 
the prefrontal cortex of brains with Alzheimer’s disease. b, Expression level of 
BAZ2B (left) and EHMTI (right) at different stages of frontal atrophy nthe 
prefrontal cortex of brains with Alzheimer's disease.¢-e, Correlationsbetween 
the expression levels of B4Z2B (x-axes) and selected nuclear genes (encoding 
‘mitochondrial proteins; y-axes) in the prefrontal cortex ofbrains with 


Pearson r= 0,52 Pearson r= 0.51 
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Alzheimer’s disease. -h, Correlations between the expression levels of EHMTI 
(avaxes) and selected nuclear genes (encoding mitochondrial proteins;y-axes) 
inthe prefrontal cortex of brains with Alzheimer’s disease. Ina-h, expression 
values (n=390 samples) of examined genesare from the dataset GSE44772, 
usingPearson’s rcorrelation coefficient for statistical testing.|, Proposed 
working model for the epigenetic regulation of mitochondrial function and 
healthy ageing. 
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Antibodies 


Antibodies used 11) Mouse monoclonal anti-Flag-Tag antibody (Same as Sigma's Antl-FLAG?): Clone/3B9; Supplier/Abmart; Cat.No./ M20008; 
Lot.No./283658; Dilution: 1/1000 for western blot. 
(2) Rat monocional anti-HA antibody: Clone/3F10; Supplier/Roche; Cat.No,/11867431001; Lot.NNo,/15645900 and 12177700; 
Dilution: 1/3000 for western blot. 
(3) Rabbit polyclonal antibody for detection of Histone H3 monomethylated on lysine 9 (H3K9me1): Supplier/abcam; Cat.No./ 
9045; Lot No. /GR3198013-1, Dilution: 1/1000 for western blot. Use 5 ug for 10 jig of chromatin in ChiP experiment. 
(4) Mouse monoclonal antibody to Histone H3 dimethylation on lysine 9 (H3K9me2}: Supplier/Abcam; Cat.No./ab1220; Lot.No./ 
‘GR325223-4; Use 5 ug for 10 yg of chromatin in ChIP experiment. 
(5) Rabbit polyclonal antibody to Histone H3 dimethylation on lysine 9 (H3K9me2): Supplier/Abcam; Cat.No,/ ab115159; Dilution: 
11/1000 for western blot. 
(6) Rabbit polyclonal antibody to Histone H3 (tri methyl K9) antibody: Supplier/Abcam; Cat.No./ab889B; Lot.No./GR3217826-1 
and GR148830-2; Dilution: 1/1000 for westem blot. Use 5 ug for 10 yg of chromatin in ChIP experiment. 
(7) Rabbit polyclonal antibody to Histone H3: Supplier/Abcam; Cat.No./ab1791; Lot. No./GR94293-1; Dilution: 1/1000 for 
‘western blot. 
(8) Mouse monoclonal anti-FLAG M2 antibody: Clone/ M2; Supplier/SIGMA; Cat. No,/F3165; Lot.No,/SL8Q7119V; Dilution: 
11/1000 for western blot; Use 2 jig for 1 yg of chromatin in ChIP experiment. 
(9) Mouse monoclonal anti GLP/EHMT 1 Antibody: Clone/80422; Supplier/ R&D Systems; Cat. No,/PP-B0422-00; Lot.No./A-2; 
Dilution: 1/1000 for western blot; Use 2 yg for 1 yg of chromatin in ChiP experiment 


Validation 


Eukaryotic cell lines 


(10) Monoclonal Anti-a-Tubulin antibody produced in mouse: clone/8-5-1-2; Supplier/SIGMA; Cat.No,/T6074; Dilution: 1/5000 
for western blot, 

(11) Rabbit polyclonal antibody against 8a22b were generated by collaboration with Abcam; not commercially availabe; Dilution: 
11/1500 for western blot. 

(12) ATPSA: Anti-ATPSA antibody [1SHAC4] - Mitochondrial Marker: Supplier/Abcam; Cat. No,/ ab14748 ; Lot No /GR209582-8; 
Dilution: 1/1000 for western blot. 

(13) MTCO1; Anti-MTCO1 antibody [106€1A8]: Supplier/Abcam; Cat No,/ ab1470S ; Lot.No./GR233531-1; Dilution: 1/1000 for 
‘western blot. 

(14) Rabbit polyclonal antibody to B-Actin: Supplier/Abmart; Cat.No./P30002M; Lot.No./294357; Dilution: 1/1000 for western 
bot 

(25) IRDye 8OOCW Goat anti-Rabbit IgG Secondary Antibody: Supplier/ L-COR Biosciences; Cat.No,/ P/N 925-32211; Dilution 
11/5000 for western blot. 

(16) IRDye 6BORD Goat anti-Rabbit IgG Secondary Antibody: Supplier/ LI-COR Biosciences; Cat.No,/ P/N 925-68071; Dilution: 
11/1000 for western blot. 


Rabbit polyclonal against Baz2b antibody was validated using our 8az2b knock-out mice, the result was shown in this 
manuscript. All the ather antibodies used in this study were validated by the suppliers, the information was available on their 
website. 
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HEK293T (Catalogue number SCSP-S02) cell line was ordered from the cell bank of the Chinese Academy of Sciences. 


‘The celine has been validated using the short tandem repeat (STR) profiling method by the cell bank of Chinese Academy of 
Sciences. 
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No commonly misidentified cell lines were used, 
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C. elegans strains: The wild-type Bristol N2, set-6(0k2195), daf-16(mu86), eat-2(ad1116}, clk-1(qm30), and TU3401 
(sid-1(pk3321),uls69) strains were obtained from the Caenorhabditis Genetics Center. baz-2{tm0235) was obtained from 
National Bioresource Project, Japan. set-6;baz-2 double mutant worms were generated by crossing set-6{0k2195) with 
baz-2{tm0235). The TU3401;Pbas-I::bas-I::gfp strain was generated by crossing TU3401 with Pbas-1::bas-1:-gfp transgenic 
‘worms (SQC0017}, The baz-2;Pbas-1=!bas-I::gfp and set-6;Pbas-1::bas-I:gfp strains were generated by crossing baz-2(tm0235) 
and set-6(0k2195) with Pbas-1=ibas-1:igip worms, respectively. The baz-2;daf-16 and set-6;daf-16 mutant worms were 
‘generated by crossing daf-16(mu86) with baz-2(tm0235) and set-6(ok2195), respectively. The baz-2;eat-2 and set-6;eat-2 
mutant worms were generated by crossing eat-2(ad1116) with baz-2{tm0235) and set-6(0k2195), respectively. The locus of the 
bar-2 is adjacent to that of cik-1, so the baz-2;clk-1 strain was generated by deleting baz-2 (yfh0100 allele) in the genomic 
background of cik-1(qm30) mutant worms using the CRISPR-Cas8 system. The set-6;clk-1 worms were generated by crossing 
‘set-6{0k2195) with clk-1(q™30). The Phsp-6::efp transgenic strain $1410 was crossed with set-6(ok2195), baz-2(tm0235), and 
set-6;baz-2 worms to express Phsp-6::GFP in these mutant worms. 

57BL/6) at embryonic day 14.5 and post-natal day 6 were obtained from Shanghai Laboratory Animal Center, Chinese Academy 
of Sciences. The null mutation in 8az2b of C57BL/6! mice were generated by Suzhou Non-human Primate Facility, Institute of 
Neuroscience, Chinese Academy of Sciences. Both male and feriale mice were used in this study. Breeding, housing, and 
‘experimental procedures were performed following the procedure approved by the Institutional Animal Care and Use 
Committee at the Institute of Neuroscience, Chinese Academy of Sciences. 


‘This study did not involve wild animals. 
Not involve 


All procedures for performing mouse behavioual test were approved by the Animal Care and Use Committee af the Institute of 
Neuroscience, Chinese Academy of Sciences, Shanghai, China, 
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The raw sequenced data generated in this study were deposited in the NCBI Sequence Read Archive under accession 
number PRINASS4977, 
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The raw sequenced data generated in this study were deposited in the NCBI Sequence Read Archive under accession 
number PRINASS4977, 


There is one ChiP-seq experiment of BAZ-2/SET-6 binding, H3K9m1, H3K9me2 and two replicates for ChiP-seq experiment 
of H3K9me3, Three batches of worms were collected far each ChiP-seq measurement. 


We performed the Illumina single-end SObp sequencing. The sequence depth of each sample was described as following: 
‘BAZ-2_GFP-ChiP Total reads: 27896954 Uniquely mapped reads: 20776365 

BAZ-2__GFP-Input Toral reads: 41170127 Uniquely mapped reads: 32719966 

SET-6_GFP-ChiP Toral reads: 26454834 Uniquely mapped reads: 18360852 

SET-6_GFP-Input Toral reads: 32004369 Uniquely mapped reads: 24228355 

We performed the illumina pair-end 150bp sequencing. The sequence depth of each sample was described as following: 
N2-a2-H3K9mel_ChiP Total reads: 24729875 Uniquely mapped reads: 27623314 

N2-a2-H3K9mel_Input Total reads: 39345751 Uniquely mapped reads: 30387782 

N2-a2-H3K9me2_ChIP Total reads: 28468453 Uniquely mapped reads: 19968945 


Antibodies 


Peak calling parameters 


Data quality 


Software 


1N2-a2-H3K9me2_Input Total reads: 26552809 Uniquely mapped reads: 21519737 
N2-a2-H3K9me3_ChiP Total reads: 28765524 Uniquely mapped reads: 21960227 
1N2-a2-H3K9me3_Input Total reads: 26552809 Uniquely mapped reads: 21519737 
{8A22-a2-H3k9mel_ChiP Total reads: 36799862 Uniquely mapped reads: 29242685 
8A22-a2-H3k9mel_Input Total reads: 28694706 Uniquely mapped reads: 22470302 
{8A22-a2-H3k9me2_ChiP Total reads: 27119242 Uniquely mapped reads: 19672636 
8A22-a2-H3k9me2_Input Total reads: 29085145 Uniquely mapped reads: 22446541 
{8A22-a2-H3k9me3_ChiP Total reads: 31181664 Uniquely mapped reads: 24045304 
{8A22-a2-H3k9me3_Input Total reads: 29085145 Uniquely mapped reads: 22446541 
SET6-a2-H3K9me1__ChiP Total reads: 39567207 Uniquely mapped reads: 31080460 
SET6-a2-H3K9me1_Input Total reads: 32115006 Uniquely mapped reads: 25974182 
SET6-a2-H3K9me2_ChiP Total reads: 27634238 Uniquely mapped reads: 19793956 
SET6-a2-H3K9me2_Input Total reads: 29907795 Uniquely mapped reads: 23793108 
SET6-a2-H3K9me3_ChiP Total reads: 26680288 Uniquely mapped reads: 20452910 
SET6-a2-H3K9me3_Input Total reads: 29907795 Uniquely mapped reads: 23793108 
1N2-a7-H3K9me1_ChiP Total reads: 15889684 Uniquely mapped reads: 12872030 
1N2-a7-H3K9me1_Input Total reads: 34280879 Uniquely mapped reads: 27097216 
N2-a7-H3K9me2_ChiP Total reads: 33272702 Uniquely mapped reads: 23613681 
N2-a7-H3K9me2_Input Total reads: 39690761 Uniquely mapped reads: 32441690 
N2-a7-H3K9me3_ChiP Total reads: 42692956 Uniquely mapped reads: 31858518 
N2-a7-H3K9me3_Input Total reads: 39690761 Uniquely mapped reads: 32441690 
8A22-a7-H3k9mel_ChiP Total reads: 36216152 Uniquely mapped reads: 29353294 
{8422-a7-H3k9mel_Input Total reads: 36518674 Uniquely mapped reads: 27660179 
{8A22-a7-H3k9me2_ChiP Total reads: 35811329 Uniquely mapped reads: 24767427 
{8A22-a7-H3k9me2_Input Total reads: 39064259 Uniquely mapped reads: 31913061 
8A22-a7-H3k9me3_ChiP Total reads: 31221555 Uniquely mapped reads: 22026684 
8422-a7-H3k9me3_Input Total reads: 39064259 Uniquely mapped reads: 31913061 
SET6-a7-H3K9me1__ChiP Total reads: 16264960 Uniquely mapped reads: 13156364 
SET6-a7-H3K9mel_Input Total reads: 36626677 Uniquely mapped reads: 29121215 
SET6-a7-H3K9me2_ChiP Total reads: 33701820 Uniquely mapped reads: 23610207 
SET6-a7-H3K9me2_Input Total reads: 43091463 Uniquely mapped reads: 35286028 
SET6-a7-H3Kame3_ChiP Total reads: 38185317 Uniquely mapped reads: 28398527 
SET6-a7-H3K9me3_Input Total reads: 43091463 Uniquely mapped reads: 35286028 
1N2-a2-H3K9me3-replicate_ChiP Total reads:62995763 Uniquely mapped reads: 60135755 
1N2-a2-H3K9me3-replicate_Input Total reads:50887869 Uniquely mapped reads: 48979574 
N2-a7-H3K9me3-replicate_ChiP Total reads:41419670 Uniquely mapped reads: 40135660 
N2-a7-H3K9me3-replicate_Input Total reads:48541033 Uniquely mapped reads: 34973814 
8A22-a2-H3k9me3-replicate_ChiP Total reads:53384244 Uniquely mapped reads: 51462411 
{8AZ2-a2-H3k9me3-replicate_Input Total reads:37493544 Uniquely mapped reads: 36256257 
{8AZ2-a7-H3k9me3-replicate_ChiP Total reads:64916141 Uniquely mapped reads: 62942690 
BAZ2-a7-H3K9me3-replicate_Input Total reads:35799538 Uniquely mapped reads: 24687361 
SET6-a2-H3K9me3-replicate ChIP Total reads:53978506 Uniquely mapped reads: 52229602 
SET6-a2-H3K9me3-replicate_Input Total reads:59507865 Uniquely mapped reads: 54687728 
SET6-a7-H3K9me3-replicate_ChIP Total reads:40578210 Uniquely mapped reads: 39186377 
SET6-a7-H3K9me3-replicate_Input Total reads:38282802 Uniquely mapped reads: 35132127 


GFP-trap agarose beads (Chromotec ACT-CM-GFA0250); H3K9mel (Abcam ab9045), H3K9me2 (Abcam ab1220) and 
H3K9me3 (Abcarn ab&&98) 


All reads including ChiP and Input were aligned to the reference worm genome(WormBase, WS256) using the Bowtie 2 
aligner (version 2.3.3.1] with the following parameters: 

bowtie? -p 6-N 1 -x $index-U Sread -§ Ssam >> Slog 2>81 

For peak calling, we used the MACS software (version 2.1.0.20150731) with the following parameters: 

macs? callpeak -t $ChIP -c Sinput-n $name -g ce ~-outdir Sout -f BAM -p 0.05 -8; 

For bedgraph files, we used the following parameters 

bamCoverage-b $file -p 30 -e 300 ~binSize 10 ~centerReads ~normalizeUsing RPKM -o Sffile%.*).rpkm.bg 


Sequencing data quality was checked with the fastqc software. Quality ofthe alignment was checked with samtools flagstat. 
We optimized the fold enrichment cutoff for peaks selecting and finally identified 5714 significantly BAZ-2 occupied loci 
‘upon filtering high quality peaks based on atleast 2-fold enrichment and less than 5% FOR. Using the same criteria, we 
identified 5198 SET-6 peaks, For H3K9me1/2/3 histone modification ChiP-seq data, we first mapped to reference worm 
genome( WormBase, WS256) and then filtered out PCR duplicates and low quality reads. Finally, we generated bedgraph 
files using 10bp bins. 


Cutadapt (version 1.15) was used to trim adapter sequence. 
Bowtie? aligner (version 2.3.3.1) was used to align reads to genome. 

SAMtools (version 1.5) was used to filter the SAM and BAM file to get uniquely mapped reads. 

HOMER (version 4.9, 2-20-2017) annotatePeaks.pl function was used to annotate peaks file 

The intersect function of bedtools (version 2.26.0) was used to obtain the overlapping peaks of AZ-2 and SET-6. 

‘The bamCoverage function of deeptools (version 2.5.4) 

The metaseq (version 0.5.5.4) framework with the RPM normalization methods was used to plot the reads average coverage 
profile ofall ChiP-seq datasets 
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Amosaic of cross-phylum chemical interactions occurs between all metazoans and 
their microbiomes. A number of molecular families that are known to be produced by 
the microbiome have a marked effect on the balance between health and disease’ ”. 
Considering the diversity of the human microbiome (which numbers over 

40,000 operational taxonomic units"), the effect of the microbiome on the chemistry 
ofan entire animal remains underexplored. Here we use mass spectrometry 
informatics and data visualization approaches" “ to provide anassessment of the 
effects of the microbiome on the chemistry ofan entire mammal by comparing 
metabolomics data from germ-free and specific-pathogen-free mice. We found that 
the microbiota affects the chemistry of all organs. This included the amino acid 
conjugations of host bile acids that were used to produce phenylalanocholic acid, 
tyrosocholic acid and leucocholic acid, which have not previously been characterized 
despite extensive research on bile-acid chemistry". These bile-acid conjugates were 
also found in humans, and were enriched in patients with inflammatory bowel disease 
or cystic fibrosis. These compounds agonized the farnesoid X receptor in vitro, and 
mice gavaged with the compounds showed reduced expression of bile-acid synthesis 
genesin vivo. Further studies are required to confirm whether these compounds have 
aphysiological role in the host, and whether they contribute to gut diseases that are 
associated with microbiome dysbiosis. 


In total, we analysed 768 samples from 96 sample sites of 29 different and organ systems (Fig. 1a, b; the 3D model savailableas Supplemen- 


organs from 4 germ-free and 4 colonized mice by liquid chromatog 
raphy-tandem mass spectrometry (LC-MS/MS) and 16S rRNA gene 
sequencing (Supplementary Table 1). Mapping the first principal coor- 
dinate position of each sample fromspecific-pathogen-free (SPF) mice 
ontoa three-dimensional (3D) mouse model” enabled us to visualize 
thesimilarity of the microbiome and metabolome through all organs 


tary Data). Different sections through the gastrointestinal tract had 
unique microbiome and metabolome profiles. There was a distinct 
difference between the similarity of the two data types in mouse 
faecal samples. The metabolome differed between faecal samplesand 
the distal gastrointestinal tract, whereas the microbiome was more 
similar between faeces and colon or caecum samples. 
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Fig.1|Global effectof the microbiome onthe chemistry ofanentire 
‘mammal. a, Three-dimensional model of mouse organs mapped withthe mean 
first principal coordinate (Extended Data Fig. 1)asa heat map (according tothe 
colour scale), fromthe germ-freeand SPF mice (n=4mice each).Ad, adrenal 
gland;bl, bladder; br, brain; caec, caecum; col, colon; cx, cervix; duo, 
duodenum; er, ear; faeces; ft, feet; hd, hand; je, jejunum;kd, kidney; Ig, lung; 
\y,liver:mo, mouth; oes, oesophagus; ov, ovary; sto, stomach; tr, trachea; ut, 
uterus; vg, vagina. b, Mean percentage and total number of unique spectrain 
each organ sampled from the two mouse groups. c, Relative abundance 
(normalizedtototalion current (TIC)) of the 30 most differential metabolites 
between the guts of germ-free and SPF mice. Themetabolites are coloured as 
secondary bile acids (blue), primary bileacids (red), soyasaponins (pink), 


Molecular networking of mouse data 

To characterize the chemical effect of the microbiome, we subjected 
themass spectrometry data to molecular networking”. The algorithm 
identified 7,913 spectra, of which14.7+2.2% were observed in colonized 
mice and 10.0+ 0.7%were exclusive to germ-free mice (Fig. Ic, Extended 
Data Fig. 1). Although the overall profiles showed that the strongest 
differences between germ-free and SPF mice were in the gastrointesti- 
nal tract, molecular networking identified unique chemical signatures 
from the microbiome in all organs—ranging from 2% inthe bladder to 
44% in stools (Fig. 1b). The metabolome of the caecum, the main site 
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peptides (yellow) and unknown (brown). Annotations are based onspectral 
‘matching or molecular network propagation (level twoor three"). 
Stereochemistry of the annotated molecules cannot be discerned usingthese 
methods. d, Meanand 95% confidence interval of the Shannon-Weiner 
diversity of the metabolomic datain each sample from the gastrointestinal 
tracts of germ-free and SPF mice. Statistical significance between metabolome 
diversity inthe same sample location between germ-free and SPF mice was 
tested witha two-sided Mann-Whitney U-test, n=4.*P=0.028, "P=0.057. 

€, Results of meta-mass-shift chemical profiling” showing the spectral counts 
of known mass differences between unique nodesin either germ-free or SPF 
mice. Each mass difference corresponds to thenode-to-node gain or loss ofa 
particular chemical group. 


of microbial fermentation of food, was most-markedly affected by 
the microbiota. Spectral library searching enabled the annotation of 
8.9% ofnodes inthe molecular network" (level two or three, according 
to previously published standards"). Many of the changesattributed to 
themicrobiome were location-specific, resulting fromthemetabolism 
of plant natural products from food and bile acids (Fig. 1c, Extended 
Data Figs. 2-4, Supplementary Data). 

Inthe upper gastrointestinal tract, the Shannon diversity of the 
metabolomes of germ-free mice mirrored those of SPF mice; in both 
sets of mice, diversity was low in the oesophagus and higher in the 
stomach and duodenum, Upon transition to the caecum, however, 


the diversity of the two groups of mice began to separate (Fig. 1d). 
The molecular diversity in the caecum and colon of colonized mice was 
higher than that of germ-free mice, but this wasnot the casein thestool 
samples (Fig. 1d). In the duodenum (the locationatwhich the gallblad- 
deradds bile to the intestine), there was acontrastin microbiome and 
metabolome diversity: ahigh metabolome diversity corresponded to 
alow microbial diversity (Fig. 1d, Extended Data Fig. 1). 

Molecular networking enabled meta-mass-shift chemical profiling” 
(an analysis of chemical transformations on the basis of parent mass 
shifts between related spectra without the requirementof knowingthe 
molecular structures) of the gastrointestinal tracts of germ-free and 
SPF mice. Incolonized mice, there was asignature for water loss inthe 
duodenum and jejunum and the loss of H,, acetyl and methyl groups 
in later parts of the gastrointestinal tract (Fig. le). Of all the H, shifts, 
23.1% were associated with bile acids, which indicates that colonization 
resulted in the oxidation of bileacids (a known microbial transforma- 
tion)®. Deacetylations were also prevalent in colonized mice, although 
themetabolites on which this occurred remain unidentified. Germ-free 
micehad mass gains that corresponded to saccharides inall regions of 
the gastrointestinal tract (Fig. le); these gains were primarily associ- 
ated with plantnatural products, suchas soyasaponins and flavonoids. 
Theabsence of these sugarsin SPF mice implicates the microbiome in 
their metabolism (Extended Data Figs. 2,3).A unique mass gain of C,H, 
was detected in the jejunum and ileum of SPF mice (Fig. 1e) and 18.2% 
of spectra with this mass gain were derived froman unknown molecule 
related to the conjugated bile acid glycocholic acid (GCA) (Fig. 2a). 
Overall, both germ-free and SPF mice had frequent and diverse mass 
losses between related molecules, butin colonized mice there were 
fewer molecules that gained a molecular group (Fig. le). Thisindicates 
that the microbiome contributed more to the catabolic breakdown 
of molecules, and less to anabolism. However, we found the addition 
of C,H, to GCA to bea particularly interesting anabolic reaction that 
was dependent on the gut microbiome, and we sought to investigate 
this further. 


Discovery of new conjugated bile acids 

Glycine- and taurine-conjugated bile acids were detected in both germ- 
freeand SPF mice. The glycine and taurineaminoacids were removedas 
they passed through the gastrointestinal tract in SPF miceonly, which 
isa known microbial transformation” (Fig. Ib, Extended Data Fig. 4). 
The molecular network of conjugated bile acids had several modified 
forms of these compounds that were present only in colonized mice, 
including the C,H, addition that was related tothe tandem massspectra 
of GCA (Fig. 2a). Our analysis of the tandem mass spectra of three of 
these SPF-mouse nodes (m/z556.363,572.358 and 522.379) showed the 
maintenance of thecore cholic acid, but with a fragmentation pattern 
thatwas characteristic of the presence of phenylalanine, tyrosine orleu- 
cine through an amidebondat the conjugation site in place ofglycine or 
taurine (Extended Data Fig. 5, Supplementary Table 2). This represents 
aset of unique amino acid amide conjugations to cholic acid that are 
mediated by the microbiome, which create the newly identified bile 
acids phenylalanocholicacid (Phe-chol), tyrosocholic acid (Tyr-chol) 
and eucocholic acid (Leu-chol). These structures were validated with 
synthesized standards by retention time and MS/MS matching on sev- 
eralinstrument platforms including targeted mass spectrometry (level 
‘one matches") (Extended Data Figs. 5, 6, Supplementary Tables 2, 5). 
‘These molecules were detected in the duodenum, jejunumand ileum of 
SPF mice only, with tenfold-lower levels found in the caecumand colon 
after targeted mass spectrometry analysis using isotopically labelled 
internal standards (Supplementary Table 4). The liver-synthesized 
glycine and taurine conjugates were not only found in these same gut 
locations, but were also observed in the gall bladder andliver (Fig. 2b, 
Extended Data Fig. 6). Phe-chol was themostabundant microbial con- 
jugate, on average, across the gastrointestinal tract; it was present at 


147.0 nmol g" tissue (s.d. + 99.9) in the jejunum, 83.6 nmol g* tissue 
(s.d.£81.3)intheileum, 4.7 nmol "tissue (s.d.+3.4) inthe caecumand 
1.6 nmol g" tissue (s.d. 12.2) in the colon, Phe-chol was presentatits 
highest concentration at 447.2 nmol g" tissue in a single sample from 
the jejunum (limit of detection (LOD) in Supplementary Tables 4, 6,7). 

The decreased abundance of these unique bile conjugates in the 
lower gastrointestinal tract prompted us to investigate whether there 
was reabsorption in the ileum or further metabolism by the micro- 
biota. We collected portal and peripheral blood from an additional 
four SPF and six germ-free mice, and screened for the presence of 
conjugated bile acids. Both taurocholic acid and GCA were present 
in the portal and peripheral blood of colonized and sterile mice, but 
the newly identified amino acid amide conjugates were not detected 
(Extended Data Fig. 6). Furthermore, incubation of these molecules 
with an actively growing human faecal batch culture showed that the 
Tyr-, Phe- and Leu-conjugated bile acids were not deconjugated by 
the microbiota—even when deconjugation readily occurred on the 
host-synthesized GCA control, a well-known amidate hydrolase activity 
ofbileacids thatis mediated by the human microbiota” (Extended Data 
Fig. 6). However, oxidation of the cholate core occurred on all three 
ofthe newly identified conjugates, which indicates that they could be 
modified by microbial enzymes even when no concurrent oxidation 
of GCA was observed (Extended Data Fig. 6). 

Intheextensive literature relating to bile acids (comprisingmore than 
42,000 publication records in PubMed"), descriptions of unusual 
conjugations of bileacids are rare. Through170 years of research into 
bile-acid chemistry, the accepted standard has been that mammalian 
bile acids are amide conjugated by ahost liver enzyme (known as bile 
acid-CoA:amino acid N-acyltransferase (BAAT)) with either glycine 
or taurine. Here we report amide conjugations with phenylalanine, 
tyrosine and leucine associated with themicrobiomeinmice,andshow 
that these compounds are common in humans. 


Translation to humans 

We performed a search using the Mass Spectrometry Search Tool 
(MASST) of 1,004 public datasets availablein the Global Natural Prod- 
ucts Social Molecular Networking (GNPS) database, which revealed 
spectral matches that correspond to Phe-chol, Tyr-chol and Leu-chol 
in 28 studies comprising samples from the gastrointestinal tract of 
both mice (3.2 to 59.4% ofall samples) and humans" (1.6 to 25.3% of ll 
samples) (Extended Data Fig. 7). In data from faecal samples collected 
for the American Gut Project’, at least one of these unique bile acids 
was foundin1.6%ofhuman faecal samples; Tyr-chol was themost preva- 
lent (n=490 samples) (Fig. 3a). These bile acids were found in higher 
frequency in samples from patients with inflammatory bowel disease 
or cystic fibrosis, or from infants, than in samples from the American 
gut project (Fig. 3a). 

We reanalysed data deposited in the GNPS/MassIVE repository 
froma previously published study of the mouse microbiome andliver 
cancer, which enabled us to compare the abundance of the newly iden- 
tified bile acid conjugates in mice fed a high-fat diet in comparison to 
their abundance when the mice were feda normal chow with or with- 
outantibiotics” (Extended Data Fig. 7). The Phe, Tyr,and Leuaminoacid 
conjugates were undetectable upon exposure to antibiotics, whereas 
GCA remained—supporting the role of the microbiome in the newly 
identified conjugation. Inthesame study”, Phe-chol andLeu-cholwere 
more abundant in mice feda high-fat diet, with no change observed in 
the host-conjugated GCA (Extended Data Fig. 7). We further validated 
thisassociation in data from aseparate study in which atherosclerosis- 
prone mice fed ahigh-fat diet also had increased levels of the microbial 
conjugates, without a corresponding change in the host-produced 
taurocholic acid (Extended Data Fig. 7). Cystic fibrosis is known to 
result in insufficient production of pancreatic lipase, microbial dys- 
biosis and the build-up of fatin the gut. Reanalysis of the publicdata 
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Fig.2| Newly identified microbial bile-acid conjugates.a, Structures and 
molecularnetworks of newly identified microbiome-conjugatedbileacids, 
with host-conjugated GCA shown for comparison. The molecular network is 
coloured by mapping to germ-free or SPF mice (according to the colour 
legend). Insethighlights the parent masses and mass differences between the 
newly discovered moleculesand GCA. Each node representsaclustered 
tandem mass spectrum; connectionsbetween thenodesindicate relationships 
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Chemical formula: C,,H,,NO, 
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through the cosine score with their width scaled by the cosine size (cut-off 
minimum of 0.7).Circular nodes are unknown molecules, and arrowheadsare 
spectra with matches inthe GNPS libraries. b, Dot plot of the area-under-the- 
curve abundance of thenewly identified and host-synthesized bile- acid 
conjugates in each SPF mouse (n=4), through the mouse gastrointestinaltract 
anditssubsections. 
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Fig.3| Presence, synthesisand function of microbial bile-acid conjugates. 
a, Percentage of samples thatwere positive for thenewly identified bileacids 
from GNPS public datasets and from paediatric patientswith cystic fibrosis 
(compared to controls without cystic fibrosis). AGP, American gut project™:CF, 
cystic fibrosis; IBD, inflammatory bowel disease: PS, pancreatic-sufficient; Pl, 
pancreatic-insufficient. The colour coding of thebileacidsapplies toa-c. 

b, Abundance of thenewly identified conjugates in the PRISM and iHMP (NIH 
Integrative Human Microbiome Project) datasets™. The statistical significance 
for the PRISM data was tested using the Wald’s test (Crohn's disease (CD), 

‘n= 68 individuals; ulcerative colitis (UC), n=53individuals;noninflammatory 
bowel disease, n=34 individuals) and forthe iHMP dataset witha linear two- 
sided mixed-effects model. TheiHIMP comparisonsare separated by typeof 
inflammatory bowel disease, and by dysbiotic or nondysbiotic state (for 
ulcerative colitis, n=12 dysbioticand 110 nondysbioticmetabolomes; for 
Crohn's disease, n=48 dysbiotic, and 169 nondysbioticmetabolomes; for 
noninflammatory bowel disease, n=15 dysbioticand 107 nondysbiotic 
metabolomes). Significance is shown using Benjamini-Hochberg-corrected 


Pvalues.Leu-chol, g=0.031; Tyr-chol, q=0.0074;Phe-chol, q=0.0043. 
*q<0.05,""q<0.05. Boxes represent the interquartile range, notch isthe 95% 
confidence interval of the mean, centreis the median and whiskers are 1.5 the 
interquartile range. ¢, Extracted ion chromatogramsof Phe-chol from cultured 
isolates of C, bolceae compared tomediumcontrolat 0h and 96h (top). 
Experiment was performed twice. d, The ratio of "C-Phe-chol:“C-Phe-chol in 
faecal samples of amouse feda high- fat diet with "C-labelled phenylalanine 
(blue line) or unlabelled phenylalanine (black line) over time. Greyarea 
indicatesa three-day period during which a high-fatdiet was fed; redindicates 
when the high-fat diet was supplemented with Phe. e, Quantitative PCR with 
reverse transcription datashowing the mean and s.e.m, of the gene-expression 
ratio (AAC, of FgflS, Shp, Cyp7BLand Cyp7al to the 36B4 (also knownas Rplp0) 
reference control in the ileum and/or liver of mice gavaged with differentbile 
acids, comparedtoamock control (corn oil) after 72h, Statistical significance 
was tested against the mock control witha two-tailed ctest (n=4 or mice per 
group).CA,cholicacid. 
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froma cohort of paediatric patients, we found that these compounds 
were more prevalent in patients with cystic fibrosis (particularly in 
those with pancreatic insufficiency) than in healthy controls (Fig.3a). 
Finally, detection of the newly identified conjugates in patients with 
inflammatory bowel disease led us to mine metabolome data from 
the second stage of the human microbiome project (HMP2)", which 
focused on differencesbetween controlsand patients with inflamma- 
tory bowel disease, including patients with Crohn’s disease or ulcerative 
colitis—subtypes of inflammatory bowel disease” (Fig. 3b, Supplemen- 
tary Table 8). All three metabolites were significantly higher in the 
dysbiotic state associated with patients with Crohn's disease, but not 
in patients with ulcerative colitis (Fig. 3b, Supplementary Data). Our 
MASST-based mining of public data from the GNPS database showed 
that these compounds are not only found in healthy humans but are 
also enriched in individuals with fatty guts and inflammatory bowel 
disease, which suggests that these compounds may havea potential 
rolein (or be symptoms of) gut dysbiosis and human disease. 


Microorganisms make the new bile acids 

‘There was a strong positive correlation between the presence of a 
species of Clostridium and all three bile acids when mice were fed a 
high-fat diet (Pearson's r for Phe-chol,r=0.73; for Tyr-chol,r=0.50;and 
for Leu-chol, r= 0.74) (Extended Data Fig. 7, Supplementary Table 3). 
The clostridia are known to oxidize, epimerize and deconjugate bile 
acids®**, We therefore cultured 20 human gut microorganisms (with 
an emphasis on Clostridium species) in faecal culture medium* that 
containedaminoacidsand cholic acid precursors to screen for produc- 
tion of thenewly identified conjugates. The Clostridium bolteae strains 
WAL14578 andCC430018 both synthesized both Phe-choland Tyr-chol 
(Extended Data Fig. 8). The addition of labelled °C-phenylalanine to 
the medium verified that WAL-1457 could synthesize Phe-chol fromthe 
amino acid and cholate precursors (Extended Data Fig. 8). Similarly, 
we fed mice a high-fat diet with ®C-phenylalanine and were able to 
detect labelled Phe-chol in their faeces, which demonstrates micro- 
bial synthesis in vivo and shows that the amino acid precursors could 
come from the diet (Fig. 3d). C. bolteae isa bile-resistant gut bacterium 
that is more common in children with autism spectrum disorder’, is 
associated with abdominal infections** and—together with Blautia 
producta—prevented colonization by vancomycin-resistant Enterococ- 
cus species in mice”. The production of these bile acids by C. bolteae 
further verifies their association with the microbiota of the mouse 
gut, and implicates them as potentially important for intermicrobial 
interactions in the gut microbiome. However, addition of the newly 
identified conjugates to batch cultures of human faecal samples did 
notaffect community structure (Extended Data Fig. 8), which led usto 
investigate how these compounds may affect gut physiology through 
host receptor signalling. 


New bile acids and the farnesoid X receptor 

The farnesoid X receptor (FXR) isa key receptor for bile acids that is 
expressed in the intestine, liver and other tissues. The most-potent 
naturally occurring agonistic ligand of FXRis chenodeoxycholicacid, 
whereas tauro-B-muricholic acid is an FXR antagonist™. To assess the 
ability of the newly identified bile acids to affect human FXR signal- 
ling, we established luciferase reporter assay in human embryonic 
kidney (HEK)293 cells”. Phe-chol and Tyr-chol were strong human-FXR 
agonists (Extended Data Fig. 9, Supplementary Table 9). The pheny- 
lalanine conjugate (R? = 0.92, half maximal effective concentration 
5M) was twice asstrong ofan agonist as chenodeoxycholic 
.88, EC.) = 9.7 UM), and the tyrosine conjugate was the most 
potent of themall (R= 0.93, ECs. = 0.14 M). Furthermore, gavage of 
mice with these compounds increased expression of the FXR effector 
genes Faf15 and Shp (also known as Nr0b2) in the intestine (12.2- and 
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133-fold with Tyr-chol at 24 h, P= 0.029 and 0.009; 6.2and 9.3-fold at 
72h,P=0.009 and 0.019) (Fig. 3e, Extended Data Fig. 9). Although Shp 
expression did not change detectably in the liver at 24 h after gavage, 
levels were increased 2.3-fold after 72 h (P= 0.017) (Fig. 3e, Extended 
Data Fig. 9). Changes in expression of the bile-acid synthesis genes 
Cyp7al and Cyp8b1 also showed a time-dependenteffect. Cyp7alwasat 
9% of control levelsat24 h (P=0.001) and CypSb1wasat 69% (P= 0.004) 
(Extended Data Fig. 9). At72h (after 4 gavages), Cyp7al expression was 
at 8% of control levels (P= 0.004), and for Cyp8b1 the transcript was 
further reducedto2% (P=0.0002) (Fig. 3e). Thestrong time-dependent 
reduction in liver Cyp7al and Cyp8b1 transcriptsindicates that—similar 
to the primary bile acid cholic acid-gavage of mice with the newly 
identified compounds reduced the expression of downstream 
FXR-target genes that are responsible for bile-acid synthesis in the 
liver. However, the possibility that this effect was due to FXR agonism 
through release of cholate from amide conjugate hydrolysis cannot 
beexcluded. 

Bile-acid metabolism by the microbiome was first described in the 
1960s". The four known mechanisms of microbial metabolism are 
dehydroxylation, dehydration and epimerization of the cholesterol 
backbone, and deconjugation of theamino acids glycine or taurine“, 
Here, we identify bile-acid transformation by themicrobiome mediated 
by afifthand completely different mechanism: amide conjugation of 
the cholate backbone with the amino acids phenylalanine, tyrosine 
and leucine. Although there are homologues of the human bile-acid- 
conjugation gene BAATin clostridial genomes, the microbial enzyme 
in question remains unknown. Regardless of the mechanism of their 
synthesis, the newly identified conjugates stimulate the human FXR 
receptor inacell-based system and the expression of FXR-target genes 
that are responsible for bile-acid productionin the liver were reduced 
when administered to mice. Additional studies are needed to under- 
stand the health implications of bile-acid reconjugation by the human 
microbiome and its potential effects on FXR-related diseases. 


Conclusion 


This study shows that the chemistry ofall organ systemsis affected by 
the presence of the microbiome. The strongest signatures come from 
the gut, particularly via the breakdown of plant natural products from 
food and the manipulation of bile acids. The microbiome is primarily 
acatabolic entity, breaking down compounds through the enzymatic 
removal of chemical groups. However, we found an anabolic reaction 
that representsafifth mechanism of bile-acid metabolismby the micro- 
biome, which operates through unique amino acid conjugations of 
cholic acid. As the connections between humans and our microbial 
symbionts become increasingly appreciated, a combination of globally 
untargeted approaches and the development of tools that interlink 
these datasets (such as the GNPS and MASST analysis infrastructure) 
will enable the more-efficient characterization of microbial molecules 
and efficient translation between model animals and human studies, 
leading toa better understanding of the deep connection between our 
microbiota, our metabolites and our health. 
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Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

All metabolomics data that support the findings of this study are 
available at GNPS (https://gnps.ucsd.edu/) under MassIVE ID num- 
bers: MSV000079949 (original germ-free and SPF mouse data), 
MSV000082480, MSV000082467,MSV000079134, MSV000082406, 
MSV000083032, MSV000083004 and MSV000083446. Thesequenc- 
ing data for the germ-free and SPF mouse study are available on the Qiita 
microbiome data analysis platform at https://qiita.ucsd.edu/ under 
study ID 10801 and through the European Bioinformatics Institute 
accession number ERP109688. Source Data for Figs. 1-3, Extended 
Data Fig. 7 are provided with the paper. 


Codeavailability 

MASST can be accessedat https://masst.ucsd.edu/;the developmentof 
MASSTis described in ref. ®. The code for MS/MS-based MASST search- 
ingis available at https://github.com/CCMS-UCSD/GNPS_Workflows/ 
tree/master/search single spectrum. 
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Extended Data Fig. 1|Microbiome and metabolome diversity ingerm-free 
and SPF mice. a, Principal coordinate (PC) analysis of microbiome and mass- 
spectrometry data highlighted by sample source as germ-free (GF) or SPF 

(n=4 mice in each group). The microbial signatures from the germ-free mice 
arean important control, which represents background reads foundin buffers, 
tipsand tubesand other experimental materials. b, Data from highlighted by 
organ source (n=4 miceineach group).¢, Bray-Curtis dissimilarities ofthe 
‘metabolome data collected from mouse organs. The dissimilarities are 
calculated within individual mice of the same group (germ-free or SPF, within’) 
oracross the germ-free and SPF groups (GF-SPF’) (n=4 mice in each group). 
Only samples collected from exact same location (subsection) are compared. 
Significance was tested with a two-sided Mann-Whitney U-test. Boxes 


represent the interquartile range (IQR), thenotchis the 95% confidence 
interval of the mean, the centre is the median and whiskersare 1.5* the IQR. 

d, Microbiome profile of the gastrointestinal tractsof SPF mice. Datawere 
generated by sequencing 16S1RNA gene amplicons from each organ and organ 
section, and analysed through the Qiita Deblur pipelineas described in the 
Supplementary Methods. Bacterial taxa of relevance are colour-coded 
accordingto the legend. e, Molecularnetwork of LC-MS/MS data with nodes 
coloured by sourceas germ-free, SPF, shared or detectedin blanks. Molecular 
families with metabolites annotated by spectral matching in GNPSare listed by 
a number that corresponds tothe molecular family. Theseare level-2or-3 
annotations according tothe metabolomics standards consortium’, 
12-OAHSA, 12-(9Z-octadecenoyloxy)-octadecanoicacid. 
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Extended Data Fig.2| Microbial metabolism ofsoyasaponinsin 
metabolomics data from germ-free and SPF mice. n=4 mice ineachgroup. 

a, Molecular network cluster of soyasaponins, coloured by source of each node 
as germ-free, SPF orshared. Structures of corresponding moleculesare shown 
innodes highlighted in yellow, according to the numbering scheme. Mean 
total-ion-current normalized abundance of each soyasaponin metabolite from 
the gastrointestinal tracts of germ-free and SPF mice. Ce, caecum; co, colon; D, 
duodenum;|, ileum}, jejunum; stl, stool;sto, stomach, Boxes represent the 
IQR, the centreis the medianand whiskers are 1.5* the IQR. n=4 mice ineach 
group.b, Molecular family of soyasapogenols, their structuresand relative 
abundancesin gut organs of germ-free and SPF mice (dataare inthe same 
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Three-dimesional model visualization (generated using’ili) of 
the normalized abundance of soyasaponin lin the mouse gastrointestinal 
tract. The abundance of the metabolite isindicated accordingtotherainbow 
spectrum (high, red; low, blue).n=4 mice in each group. d, Three-dimensional 
cartography (generated using ‘ili) ofthe normalized abundance of 
soyasapogenol Bontoanmagneticresonance imaging organ model of 

the mice. e, Mean normalizedabundance of soyasaponin  throughall 
gastrointestinal samplelocationsin the germ-free and SPF mice. f, Mean 
normalized abundance of soyasapogenol through all gastrointestinal 

sample locations. The annotationsare level twoor three", 
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aglycones. ¢, Three-dimensional molecular cartography mapping the 
abundance of the daidzein and glycitein glycone and sulfated forms through 
entire 3D mouse model. Thenormalized abundance of a particular moleculeis 
indicated asaheat map. Red, most abundant; blue, least abundant. d, Three- 
dimesional molecular cartography mapping the abundance of the daidzeinand 
glyciteinaglycone forms through entire 3D mouse model. The gastrointestinal- 


Extended DataFig.3| Microbial metabolism of plantisoftavonesin 
metabolomicsdata from germ-freeand SPF mice. a, Structures, molecular 
networkand total-ion-chromatogram-normalized abundance of glycone 
isoflavanoids in the mouse gastrointestinal tract. Nodesare coloured 
accordingto their source in germ-free or SPF mice (n=4 mice each),andknown 
library hitsare shaped as arrowheads. Boxes represent the IQR, the centreis the 


median and whiskersare 1.5* the IQR. b, Sameinformation as ina, for the tract model isinset for reference. The annotations are level twoor three". 
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Extended Data Fig. 4 Microbial metabolism of known bile acidsin 
metabolomicsdata from germ-free and SPF mice. n=4miceineachgroup. 
a, Total-ion-chromatogram-normalized abundance of taurocholic acidand 
secondary bile acids in gastrointestinal tract samples from germ-free and SPF 
mice. Gall, gallbladder; liv, liver. Boxesrepresent the IQR, the centreisthe 
‘median andwhiskersare 1.5x the IQR. b, Three-dimesional molecular 


| 


cartography mapping the abundance of the same bile acidsasinathrough the 
mouse gastrointestinal-tract model; liver is separated for better visualization. 
‘The normalized abundance ofa particular molecule isindicated as aheat map. 
Red, most abundant; blue, leastabundance. The annotations are level two or 
three™. 
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Extended Data Fig. 5|Massspectrometry analysis ofnewly identified 
conjugated bile acids. a, Extracted-ion-chromatogram MS'traces of Tyr-chol 
(m/z572.37 + 0.05Da), Phe-chol (m/z556.37 *0.05Da) and Leu-chol (m/z 
522.37 £0.05 Da). Experimentswere performed four times. b, Extractedion 
chromatograms for the synthetic muricholic and cholic cid versions of the 
Phe (m/z556.37 + 0.05), Tyr (572.37 + 0.05) and Leu (522.37 + 0.05) conjugates, 
showing the differentretention times from the muricholic-and cholic-acid 
forms.c, Retention time alignments of synthetic muricholic-and cholic-acid 
conjugates withthe newly identified conjugates found inasample from the 


Jejunum of colonized mouse. The isoleucocholic- and leucocholic-acid 
analysis was run on along: gradient high-performance liquid-chromatography 
column to separate isomeric le and Leu conjugates, and to compare to those 
detected in vivo.d, Annotation of MS/MS fragmentation patterns for the three 
conjugated bile acids and GCA. Structures of the immonium ions fromamino 
acid fragmentation, whole amino acid fragments and the major sterol fragment 
are shown. Loss ofthe amino acid mass onthe bile-acidsteroid backbone isalso 
highlighted, 
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Extended Data Fig. 6 See nextpage for caption. 
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Extended Data Fig. 6| Distribution and metabolism of newly identified 
conjugated bileacids. a, Molecular network of MS/MS data fromsynthesized 
aminoacid conjugated bile acidsand the duodenum of SPF mice. LC-MS/MS 
data from synthetic standards were networked with mousesamplesand 
spectral matching. Molecular networkingis indicated by node colouring. 
Mirror plotsshow the alignment betweenthe mouse and the synthetic 
standards. Nodes shaped as arrowheads had hitsin the GNPSlibraries, and 
node sizeis scaled to the spectral count. Tauro, taurocholic acid. These 
experiments were performed twice. b, Three-dimensional molecular 
cartography of the mean abundance of the newly discovered conjugates 
‘mapped ontoa3D-rendered modelof the mouse gastrointestinal tract, asa 
heat map according tothe colour scale. Organs are labelledas describedin 
Fig. 1.¢, Molecular network of conjugated bile acids from portal and peripheral 
blood of germ-free and SPF mice. Nodes are coloured by sourceasgerm-free 
portal, germ-free portal and peripheral blood, SPF portaland peripheral blood, 
GF portal and peripheral blood and SPF peripheral blood, andall. Arrowhead 


nodes represent known compoundsin the GNPS spectral database; circular 
nodes represent unknowncompounds. Theannotationswere obtained 
through spectral matches against reference libraries (level two or three"). 

d, Mean area-under-the-curve abundance ands.d. of bile acids of interest 
during incubation with an actively growing batch human faecal culture for 24h 
(n=3independent incubations). e, Molecular network of newly identified 
conjugated bile acids after incubationin ahuman faecal batch culture 
experiment. Each node representsa unique tandem mass spectrum; 
arrowhead-shaped nodes indicate known spectrain the GNPS database. The 
nodesare coloured by their retention time according to thelegend, andthe 
‘mass shifts between nodesare mapped onto the edge representingthecosine 
connection between related spectra. The H.massshift representing oxidation 
ofthe newly identified conjugatesis shown. f, Meanionintensity ands.d.ofthe 
oxidized forms of Phe-chol, Tyr-chol and Leu-chol through the 24-h batch 
faecal cultureincubation (n=3independent incubations). 
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Extended Data Fig.7| MASSTsearch results and associations of newly 
identified conjugated bile acids with high-fat diet.a, Proportion of samples 
inwhich Phe-chol, Tyr-choland Leu-chol were found fromasingle-spectrum 
MASST search of publicly available data on GNPS, Massive dataset identifers 
areshown for each dataset, are divided into mouse (‘murine’) or human 
gastrointestinal samples. b, Box plots of the newly identified conjugatesina 
previously published mouse study, inwhich mice were fed high-fat diet (HFD) 
(n=14 mice) or normal chow (NC) (n=19 mice) (Gly, P=0.72; Phe, P= 0.038; Tyr, 
).083; Leu P=9.4 x10") and dot plot of mice treated with (n=27 mice) or 
without antibiotics (Ab) (n=415 mice)”. Boxes represent the IQR, the lineisthe 
‘medianand whiskersare 1.5% the IQR. Colour legend applies to both aand b. 

¢, Mean normalized abundance of the three newly identified conjugated bile 


Weeks Feéding 


acids compared to taurocholicacid in mice (Apoe-knockout onaCS7BL/6} 
background) fed either ahigh fat diet (n=12mice) or normal chow (n=12mice) 
for 10 weeks. Faecal samples were collected and extracted in 50:50 
methanol:water and analysed with LC-MS/MS metabolomics,as describedin 
the Supplementary Methods. Thes.d, around themean isshown, and 
significance between ahigh-fat diet and normal chowat eachtime pointis 
tested with two-sided Student's -test.***P<0.001. d, Correlationsbetween 
rarefied readsofa deblurred read assigned to.aClostridiumsp. from 
atherosclerosis-prone mice fed a high-fat diet overtime (n=12 mice). 

The lineof best itis plotted using the Im method in theR statistical software; 
grey area around the line of best fitisthe 95% confidence interval. 
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Extended Data Fig. 8 | Synthesis of newly identified conjugated bile acidsby 
Clostridium.a, Dot plot of the measured production of Phe-choland Tyr-chol 
usinga targeted liquid chromatography-mass spectrometry method for two 
C.bolteae strains grown in faecal culture medium (FCM) with or without 
labelled Phe (n=2independent cultures). b, The mean ratioands.e.m. of 
®C-Phe-chol:!"C:phe-chol from the same C. bolteaestrains when grown with 
faecal culture medium with °C-labelled phenylalanine (bottomleft) 
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(n=2cultures).¢, Mean ands.d, of the Shannonindex of human faecal batch 
culture (n=3 cultures) before andafter 24-h growth exposed to conjugated 
bileacids ora mock control.NS, not significant by Mann-Whitney U-test. 

4d, Box-and- whisker plots of concentration of Phe-choland Tyr-chol in original 
samples from the gut of SPF mice. Boxes represent the IQR, the centre is the 
median and whiskersare 1.5x the IQR.n=4 mice. 
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Extended Data Fig. 9 | Effect of newly identified bile acids on FXR.a, Mean mean fold expression change compared to 36B4 control of variousbileacids 
normalized luciferase activity asa readout of humanFXRstimulationwhen aftergavagein mice. Errorbarsares.e.m.c, Liver fold expression change 
exposed to various conjugated and unconjugated bile acids, asa function of compared to 36B4 control of various bile acids after gavage in mice. 
the compound dose. n=8 measurements, +s.e.m. DCA, deoxycholicacid; Significance was tested with two-tailed -test compared tothe mock 
CDCA, chenodeoxycholicacid; T-BMCA, tauro-f-muricholicacid.b, leum corn-oil control. Error bars ares.e.m. 
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Group 2 innate lymphoid cells (ILC2s) regulate inflammation and immunity in 
mammalian tissues!, Although ILC2s are found in cancers of these tissues’, their roles 
in cancer immunity and immunotherapy are unclear. Here we show that ILC2s 
infiltrate pancreatic ductal adenocarcinomas (PDACs) to activate tissue-specific 
tumour immunity. Interleukin-33 (1L33) activates tumour ILC2s (TILC2s) and CDS" 

T cells in orthotopic pancreatic tumours but not heterotopic skin tumours in mice to 
restrict pancreas-specific tumour growth. Resting and activated TILC2s express the 
inhibitory checkpoint receptor PD-1. Antibody-mediated PD-1 blockade relieves ILC2 
cell-intrinsic PD-Linhibition to expand TILC2s, augmentanti-tumour immunity, and 
enhance tumour control, identifying activated TILC2s as targets of anti-PD-1 
immunotherapy. Finally, both PD-1° TILC2s and PD-I' T cells are presentin most 
human PDACS. Our results identify 1LC2s as anti-cancer immune cells for PDAC 
immunotherapy. More broadly, ILC2s emergeas tissue-specific enhancers of cancer 
immunity thatamplify the efficacy of anti-PD-Limmunotherapy. As ILC2s and T cells 
co-existinhuman cancers and share stimulatory and inhibitory pathways, 
immunotherapeutic strategies to collectively target anti-cancer ILC2s and T cells may 
bebroadly applicable. 


ILC2sare innate antigen-independentlymphocytes that regulateimmu- 
nity to pathogens and commensals in tissues. Although ILC2s have 
been detected in cancers, their role in tumour immunityis unclear. 


TILC2s infiltrate pancreatic cancers 

Toinvestigate the role of ILC2sin cancer, weanalysed tumour-infiltrat- 
ing lymphocytes in unselected primary human PDACs. We found intra- 
tumoral cells that lacked immune cell lineage markers (lineage’) but 
expressed markers of ILCs (CD25 and CD127)*and ILC2s (IL33 receptor 
(S72, alsoknown as ILIRL1 or IL33R) and GATA3) (Fig. 1a, Extended Data 
Fig. 1a, Supplementary Table 1). These putative TILC2s were enriched 
in ‘hot’ tumours (enriched in CD8' T cells) from rare long-term PDAC 
survivors when compared with ‘cold’ tumours from short-term sur- 
vivors. In addition, higher TILC2 frequencies correlated with longer 
survival (Fig. 1b, Extended Data Fig. 1b). Higher bulk RNA expression 
of the ILC2-activating cytokine /L33in tumours, but not of any other 


ILC-activating cytokine, was associated with longer survival (Fig. 1c, 
Extended Data Fig. Ic, Supplementary Table2). Furthermore, expres- 
sion of /L33, but not of other ILC-activatingcytokines, correlated with 
higher intratumoral immune cytolyticactivity (Fig.1c, Extended Data 
Fig. Ic). Although these data assess RNA and not protein expression, 
they suggest that IL33 and TILC2s activate anti-tumour immunity in 
human PDAC. 

Wenext looked for ILCsin tumours from mice in which PDAC devel- 
opmentisdriven by mutated Kras and p$3 (autochthonous KPC mice)‘ 
and orthotopic mouse models of PDAC (PDAC mice*’).In both models, 
we detected TILC2s that were phenotypically similar to those in human 
PDACsand to mouse ILC2s** (Fig.1d, Extended Data Fig.1d-f). Thefre- 
quency of mouse ILC2s was increased in tumours, but notin adjacent 
organs (Fig.1d, Extended Data ig. 1g), consistent with their tissue resi- 
dency’, and were depleted inRag2 mice" by targeting the lympho- 
cyte antigen CD90.2 (Fig. le, Extended Data Fig. 1h). Therefore, ILC2s 
are conserved cells that expand locally in mouse and human PDACs. 
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Fig. 1/1L33-dependent TILC2s infiltrate human and mouse pancreatic 
cancer. a, Gating, frequency, and phenotype of ILCs from unselected patients 
with PDAC. Grey curves, isotype controls; numbers, mean fluorescence 
intensity. b, Frequency (top) and survival association (bottom) of 1LC2s in 
microarrays of tumoursamples from short-and long-term PDAC survivors, 

¢, Bulk tumour 1 33mRNA association with survival and tumour cytolyticindex 
(CYT)in short-and long-term PDACsurvivors., Gating and frequency of LCs, 
inmice with PDAC.e, intratumoral LC frequency and number in Rag?” PDAC 
mice treated with anti-CD90.2 or isotype (Iso) antibodies. f, Gating, frequency, 


To identify the signals that expand TILC2s, we found that IL33 was 
themosthighly expressed ILC-activating cytokine in tumoursin both 
PDAC and KPC mice’? (Extended Data Fig. 2a). 1L33 was heterogene- 
ously expressed in both human and mouse PDACs (Extended Data 
Fig. 2b, c) and maximally expressed in intratumoral myeloid cells" 
(Extended Data Fig. 2d, e). To understand the role of IL33 and TILC2s, 
in PDAC immunity, we studied TILC2 dependency on IL33 in IL33"*" 
PDAC mice, tomodelIL33"**, ILC2-enriched hot tumoursin long-term 
human PDAC survivors. The expansion and function of TILC2s were 
1L33-dependent, as /133”” PDAC mice had reduced TILC2 number, 
frequency (Fig. If, Extended Data Fig. 2f) and cytokine production 
(Extended Data Fig. 2g) when compared to 133°" PDAC mice. Recom- 
binant IL33 (r1L33) expanded ILCsin LC-proficientRag2" PDAC mice, 
butnotin ILC-deficient mice that lack both Rag2 and the gene encod- 
ing the gamma subunit of the IL2 receptor (J/2rg, also known as yc) 
(Rag2"'yc"” PDAC mice) (Extended Data Fig. 2h, i). Collectively, these 
data show that IL33 expands TILC2s in PDACs. 


TILC2s boost tumour immunity in tissues 

AsILC2shave tissue-specific phenotypes", we hypothesized that TILC2s 
have tissue-specific effects on PDAC immunity. To test this, we com- 
pared the effects of IL33 deficiency on pancreatic and skin tumour 
growth (pancreatic TILC2s express ST2 whereas skin TILC2s donot’; 
Extended Data Fig. 2)). Compared with 1133" mice, 1133" mice with 
orthotopic PDAC had larger tumours, accelerated tumour growth, and 
worse survival (Fig. 2a). By contrast, mice with subcutaneous PDACs 
showed no IL33-dependent phenotype (Fig. 2b, Extended Data Fig. 2k). 
Although these mice were fully backcrossed onto identical geneticback- 
grounds, we confirmed that the differences were not due to potential 
minor genetic mismatches by observing larger tumours inl33’”" mice 
compared to 33” littermates (Extended Data Fig. 21). These anti- 
tumour effects depended on 1L33 produced by host haematopoietic 
cells, as chimaeric mice transplanted with /133” bone marrow had 
larger tumours than did control mice (Extended Data Fig. 2m-o).RNA 
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and number of ILCsin 1133" and 1133” PDAC mice. High and lowinb, edefined 
as higher or lower, respectively, than the median for the cohort. HR, hazard 
ratio. d-f, Data were collected 4 (df) and 10(e) daysafter tumour 
implantation, pooled from two or moreindependent experiments withn23 
per group; each point indicates one mouse analysed separately. n, number of 
tumours from individual patients or mice. Horizontal bars show medians. 
Pvalues determined by one-way ANOVA with Tukey’s (a) and Kruskal-Wallis 
multiple comparison (@) post tests, two-tailed Mann-Whitney test (b,e,f), 
two-sided log-rank test (b,¢, survival curves), and linear regression (€) 


sequencing (RNA-seq) of purified CD45" intratumoral immune cells 
from/33"" and 33" mice with orthotopic PDACs showed that PDAC 
immune cells from /133’” mice had diminished transcriptional signa- 
tures of T cell activation and MHC class antigen processing (Extended 
Data Fig. 3a), suggesting that 33" PDAC mice might havea defect 
in T cell priming. Consistently, 1/33” mice with orthotopic but not 
subcutaneous PDACs had lower frequencies of global and activated 
tumour -infiltrating CDS" T cells and reduced central memory CD8* 
Tcells (T.,,)in draininglymph nodes (DLN) but not distant lymph nodes, 
with no consistent changes in other immune cell frequencies (Fig. 2c, 
Extended Data Fig. 3b-e). Depletion of all T cells prevented the increase 
in tumour size in 133" compared to 1133"" mice (Fig. 2d), and riL33- 
treated Rag2’- PDAC mice showed no differences in tumour weight 
compared with untreated mice (Extended Data Fig. 4a), confirmingthat 
the anti-tumour effects of 33 were mediated by T cells. Orthotopic 
tumours from 1133” and 1133" PDAC mice also had similar histology 
and collagen and fibroblast content (Extended Data Fig. 4b-d), and 
rIL33had no effects on tumour cells n vitro (Extended Data Fig. 4e-g), 
showing that IL33 had no direct effects on tumour or stromal cells. 
Together, these data show that IL33 activates tissue-specific cancer 
immunity by potentially activating TILC2s to prime CD8° T cells. 
Wenext tested whether the tissue-specific effects of IL33 depended 
on CD8'T cells by contrasting the rejection phenotypes of KPC cells 
expressing the CD8' T cell rejection antigen ovalbumin (KPC-OVA cells) 
at different tissue sites. Notably, 70% of 33°" mice rejected orthotopic 
KPC-OVA tumours, whereas 0% of 33” mice did, By contrast, 100% 
of 133°" and 1133" mice rejected subcutaneous KPC-OVA tumours 
(Fig. 2e). To assess whether this phenotype resulted from ILC2 defi- 
ciency and ineffective CD8' T cell priming, we acutely depleted ILC2s 
and examined antigen-specific CDS' T cellsin DLNsin the iCOS-T mouse, 
in which diphtheria toxin depletes ILC2s while sparing ICOS'CD4 
T cells” (Fig. 2f, Extended Data Fig. Sa). 1LC2 depletion recapitulated 
the/33’” phenotype; mice with orthotopic KPC-OVA tumours showed 
alower rate of tumour rejection and larger tumour size, whereas those 
with subcutaneous tumours showed no differences (Fig. 2F), with an 
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Fig.2|TheIL33-ILC2 axis activates tissue-specific cancer immunity. 

a,b, Tumour weight, tumour volume, andsurvival of 33" and 133 mice with 
orthotopic (a) or subcutaneous (b) PDACs. , Frequency ofall (left) and IFN-y 
producing (right) CD8"Tcellsin orthotopic 133" and 1133” PDAC tumours. 

4, Tumour weight in T cell-depleted 33°" and 1133 mice with orthotopic 
PDACS.e, Frequency of tumour rejection (orthotopicand subcutaneous) and 
tumour weight (orthotopic) in 133°" and 133” mice with KPC-OVAPDACs. 
£,Experimental design (lef) frequency of tumour rejection (middle), and 
tumour weight (right) of KPC-OVAPDAC tumoursin iCOS*T mice with intact or 


anticipated varied phenotype compared to 1133 mice due to differ- 
ences in time of rejection assessment and depletion efficacy. Tetramer 
analysis in mice with orthotopic KPC-OVA tumours depleted of ILC2s 
revealed areduced frequency of OVA-specific CD8" T cellsin DLNsand 
spleens, anda reduced frequency of CD8" T,,,cellsin DLNs (as seen in 
11337 mice) (Fig. 2g, Extended Data Fig. Sb, c). Therefore, ILC2 defi- 
ciency partially phenocopied IL33 deficiency. Although direct effects 
of 33 onCD8'T cells cannot beruled out, we found no ST2expression 
on intratumoral CDS‘ T cells (Extended Data Fig. Sd). To summarize, 
these loss-of-function experiments suggest that the IL33~TILC2 axis 
primes tissue-specific CD8' T cell PDAC immunity. 

Next, we investigated whether riL 33 treatment had similar tissue- 
specificanti-tumour effects. Treatment with r1L33 prevented tumour 
establishment in mice with orthotopic PDACs and prolonged sur- 
vival, but had no effects on mice with subcutaneous PDACs, resulting 
in progressive tumour growth and ulceration requiring euthanasia 
(Fig. 3a). riL33 had similar tissue-specific anti-tumour effects in KPC- 
OVA PDAC mice (Extended Data Fig. 6a). Similarly, rIL18, a cytokine 
that preferentially activates ILI8R’ skin ILC2s", restricted the growth 
of subcutaneous PDACS infiltrated by ILI8R’ ILCs, but not of ortho- 
topic PDACs, which lack ILISR’ ILCs (Fig, 3b, Extended Data Fig. 6b). 
1IL33 selectively expanded ILC2sin DLNsand tumours (Fig. 3c) butnot 
inspleens of mice with orthotopic PDACs or inany organsin micewith 
subcutaneous PDACs (Extended Data Fig. 6¢, d). ILC2 expansion was 
accompanied by enhanced intratumoral CD8* T cell cytokine capac- 
ity and PD-1 upregulation (Extended Data Fig. 6e), with no consistent 
changes in other intratumoral immune cells (Extended Data Fig. 6f), 
although potential modulation of their function cannot beruled out. 
Consistent with indirect priming of anti-tumour CD8'T cells by ILC2s, 
11L33 treatment doubled the number of intratumoral CD103 dendritic 
cells (DCs) (Fig. 3d, Extended Data Fig. 6g), which prime and recruit 
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depleted 1LC2s.g, Gating (top) and frequency (bottom) of OVA-specific CDS" 
Tcells inDLNs of COS-T mice with intact or depleted ILC2sin orthotopic KPC- 
OVAPDACS. Data were collected 14 days (a,¢,d), 28 days (b), 42 days(e),and 

8 days (f,g) after implantation. Mean s.e.m; horizontal barsshow median, 
Data pooled from two or moreindependent experiments withn>4 per group: n 
and data points denote individual mice analysed separately. Pvalues 
determined by two-tailed Mann-Whitney test (a-g), two-sided log-rank test 

(a,b, survival curves), two-way ANOVA withSidak’s multiple comparison test 
(a,b, tumour volumes), and y*test (e,f, per cent rejection). 


CD8*T cells into PDACs’. To determine whether the effects of riL33 
depended on ILC2s, we administered rlL33 to PDAC-bearing Rora"™" 
17" mice, which are constitutively deficientinILC2s”.ILC2 deficiency 
(Extended Data Fig. 6h) abrogated the efficacy of rIL33 (Fig. 3e) and 
attenuated increases in CD103" DCs in tumours (Fig. 3f). rIL33 also 
had no anti-tumour effects (Fig. 3g) and failed to induce PD-1 expres- 
sion in intratumoral CD8° T cells (Extended Data Fig. 6i) in CD103" 
DC-deficient Batf3’ mice, showing that CD103' DCs are essential for 
rIL33-mediated tumour control. To test whether TILC2s produced 
chemokines to recruit DCs into tumours, we used single-cell RNA-seq 
(scRNA-seq; Extended Data Fig. 7a-c, Supplementary Table 3) and 
found thatrlL33-activated TILC2s retained markers of ILC2 identity but 
exhibited distinct transcriptional profiles (Extended Data Fig. 8a-e), 
and selectively expressed Ccl5 (Extended Data Fig. 8f). CCLS recruits 
D103" DCsinto tumours" andinducedefficient DC migration in vitro 
(Fig. 3h). Together, these data suggest that rlL33 expands TILC2sto pro- 
duce CCLS, potentially recruit CD103" DCsinto tumours, and activate 
CD8' T cells to induce therapeutic tumour immunity. 


PD-1blockade activates TILC2s 

As stimulating 1LC2s with rIL33 had anti-tumour effects, we searched 
for strategies to further activate ILC2s. Recent data have shown that, 
like T cells, 1LC2s regulate their activity through coinhibitory immune 
checkpoint pathways. Specifically, the immune checkpoint PD-1 regu- 
latesmouseILC2.developmentand marks effector ILCs”, andwhen PD-1 
is genetically deficient or inhibited with a blocking antibody (anti-PD-1), 
1L33-activated ILC2s show increased expansion and effector function 
in mice and humans”’, PD-1" ILC2sare also found in human tumours’. 
However, inhibiting immune checkpoints on|LC2s for cancer therapy 
has been unexplored. 
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Fig.3|ILC2sstimulate tissue-specific cancer immunity by recruiting 
intratumoral dendritic cells.a, Tumour weight, tumour volume, and survival 
inorthotopic and subcutaneous PDAC mice treated with vehicle or r1L33. 

b, Tumour weight and volume in mice with orthotopic or subcutaneous PDACS 
treated with vehicle or L18.¢, Gating, frequency, and number of ILC2s inrIL33- 
treated mice with orthotopic PDACs (DLN vehicle, n=13,top;tumour vehicle, 
n=12, bottom; DLN rIL33, n=14, top; tumour tlL33, n=15, bottom). d, Gating 
and frequency of CD103" DCsin tumours from rlL33-treated mice with 
orthotopic PDACS. e, f, tumour weightand volume (e) and frequency of CD103° 


Using scRNA-seq (Extended Data Fig. 7a-c), we found that PD-Lwas 
the only detectable coinhibitory molecule expressed at baseline by 
TILC2s (Extended Data Fig. 9a). Treatment with rll 33 upregulated PD-1 
ona fraction of TILC2s but not on DLN ILC2s (Extended Data Fig. 9b), 
suggesting that PD-1 may functionally restrain activated TILC2s. We 
therefore tested whether combining r1L33 with anti-PD-1 treatment 
could cooperatively activate TILC2s to enhance anti-tumour efficacy. 
Consistent with the expression of PD-1 only on rlL33-activated TILC2s, 
anti-PD-1 alone induced a partial anti-tumour response (Fig. 4a, as 
previously reported in PDACs’) but did not appreciably alter TILC2 
frequencies (Fig. 4b, Extended Data Fig. 9c). A combination of rlL33 
and anti-PD-1 maximally expanded ILC2sin tumoursand DLNs (Fig. 4b) 
and enhanced tumour control compared to anti-PD-1 alone (Fig. 4a). 
To investigate whether anti-PD-1 activated ILC2s by cell-intrinsic PD-1 
blockade, we compared the single-cell transcriptional profiles of TILC2s 
and DLNILC2s following in vivo treatment. Whereas TILC2s retained 
thetranscriptional and cellular identities of ILC2s irrespective of treat- 
ment (Extended Data Fig. 94), rIL33 and anti-PD-1 induced a unique 
transcriptional phenotypein TILC2sin PDAC mice compared toall other 
treatment conditions and tissue sites (Fig. 4c), andincreased expression 
ofILC2markers, canonical (amphiregulin (Areg))'Sand non-canonical 
(Cxcl2)" ILC2 effector molecules, cellular activation machinery (junb, 
Fosi2, and Ybx1), and coinhibitory immune checkpoints (Extended 
Data Fig. 9e-i). Finally, the anti-tumour effects of dual therapy were 
abrogated in ILC2-deficient mice (Fig. 4d), which shows thatILC2sare 
necessary for the efficacy of dual anti-PD-1 and rlL33 therapy. These 
results suggest that anti-PD-1 amplifies activated TILC2s by possibly 
inhibiting the PD-1 pathway on ILC2s, and notonT cells. 
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DCs (fin tumours from rIL33-treated wild-type (WT) and ILC2-deficient mice 
with orthotopic PDACs. g, Tumour volume in rIL33-treated WT and CD103' DC- 
deficient Bat/3*" mice with orthotopicPDACs. h, Migration of purified DCs 
towards CCLS. Data were collected (c,d) or 7(e,f) weeks after tumour 
implantation. Median +s.e.m; horizontal bars show median. Datawere pooled 
fromatleasttwo independent experiments, withn>3pergroup; mand data 
points denote individual mice analysed separately or individual replicates (h). 
Pvalues determined by two-sided log-rank test (a, survivalcurves), two-way 
ANOVA (a,b, €, g,tumour volume), and two-tailed Mann-Whitney test (a-f,h). 


PD-1inhibits cell-intrinsic TILC2 function 

To investigate whether interrupting the PD-1 pathway on activated 
TILC2s contributed to the anti-tumour effects of dual therapy, we 
transferred sort-purified rlL 33-activated PD-1 proficient (wild-type) 
or PD-Ideficient (Pdcdf'*) TILC2sinto tumour-bearing ILC2-deficient 
mice (Fig. 4e, Extended Data Fig. 10a). Transfer of wild-type TILC2s 
had no anti-tumour efficacy in established tumours, but transfer of 
PdcdI TILC2s restricted tumour growth, indicating that interrupt- 
ing PD-1 signalling on TILC2s can enhance tumour control (Fig. 4e). 
We next tested whether riL33-activated PD-1’ TILC2s could directly 
amplify the efficacy of anti-PD-1 therapy in established tumours. We 
transferred sort-purified rlL33-activated congenic CD45.1' TILC2sinto 
CD45.2° ILC2-deficient mice with established tumours, and treated 
the mice with anti-PD-1 after the transfer (Fig. 4f). Transferred TILC2s 
were more than 97% PD-1' (Extended Data Fig. 10b), accumulated in 
the tumours and DLNs but not spleens of anti-PD-1 treated recipient 
mice, and persisted for up to 9 weeks post-transfer (Fig. 4g). Transfer of 
PD-1' TILC2saugmented the efficacy of anti-PD-1 treatment, restricted 
tumour growth (Fig. 4f), and increased cell frequenciesin the tumours 
and DLNs, but not spleens, of recipient mice (Fig. 4h). These data show 
that blocking PD-1 signalling on rlL33-activated TILC2s directly ampli- 
fied the anti-tumour efficacy of anti-PD-1 treatment. 

To examine the efficacy of rlL33 and anti-PD-1 treatment in IL33°™, 
anti-PD-L-resistant tumours, we selected an aggressive cold PDAC 
tumour model (KPC 52mice) that generates IL33*™ tumours (Extended 
Data Fig. 2b), has $0% fewer CD8' T cells thanIL33"" tumours, and has 
a median survival of only 2 weeks (Extended Data Fig. 10c), to mimic 
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Fig. 4|PD-Lblockade activates TILC2s. a, b, Tumour volume (vehicle,n=8; 
1IL33,n=13;anti-PD-1 +1133, n=14) and survival (vehicle, n=9;11L33,n=10; 
anti-PD-1+rlL33, n=15) (a)and gating, frequency, and number of 1LC2s (b) in 
treated PDAC mice, ¢, scRNA-Seq (n=7,022singleILC2s) in treated PDAC mice 
inanonlinear representation of the top 15 principal components; cells are 
coloured by cluster (left) or treatment and tissue (right).4, Tumour volume in 
wild-type (WT) and ILC2-deficient PDAC mice treated withanti-PD-1andriL33. 
¢, TILC2s were sort purified from riL33-treated WT or PdedI PDAC miceand 
transferred into ILC2-deficient PDAC mice, and tumour volumes were 
‘measured, f-h, TILC2s were sort purified from rll 33-treated PDACCD4S.1 
donor miceand transferred intoILC2-deficient CD45.2PDAC recipient mice, 
which were then treated with anti-PD-1.f, Experimental design, tumour volume 
and tumour weight:g, frequency of CD45.land CD4S.2cells;h, frequency of 
Tcellsinrecipient mice 9 weeks after cell transfer (TILC2’:all groups, 


the immunological and survival features of patients with IL33" PDAC 
tumours who show short-term survival. Although KPC 52 PDAC mice 
do not exhibit the sequential steps of PDAC tumorigenesis from pre- 
invasive neoplasias to invasive PDAC thatare seen in spontaneous KPC 
mice, they recapitulated theanti-PD-1 resistance seen in spontaneous 
KPC miceand human PDACs (Fig. 4i). Combination treatment with rIL33, 
and anti-PD-1 reduced tumour volume by over SO%in thesemice, witha 
nearly $0%improvementin survival (Fig. 4i).Finally, toassess the poten- 
tial touse dual rlL33 and anti-PD-1 therapy to treat patients with PDAC, 
we investigated the co-occurrence of PD-I’ TILC2s and PD-1'T cellsin 
human PDACs. Nearly 60% of human PDACs had low frequencies of 
PD-I’ TILC2s and PD-1'T cells, with a significant correlation between 
the two cell types (Extended Data Fig. 10d), which suggests that they 
frequently co-occur in human PDAC. In addition, 133 mRNA correlated 
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TILC2":spleen,n=9;DLN, n=7; tumour, n=7). Frequencies ing represent 
percentage of live donor-orrecipient-derived immune cells. , Tumour volume 
(vehicle, n=13;other groups, n=10) and survival (vehicle and anti-PD-1, n=15; 
11L33, n=24;r1L33 + anti-PD-1, n=26) of treated PDAC mice (KPC S2cells).1LC2 
deficient, Rora'"II7H™, Datawere collected 5 weeks (b),10 days(c),or weeks 
(d) after orthotopic tumour cell implantation. Median +s.e.m; horizontal bars 
show median. Data pooled fromat least two independent experiments with 

n> 3pergroup; nand data points denote individual mice analysed separately. 
Data for scRNA-seq represent pooled purified single cells from biological 
replicates (vehicle n=10, rL33n=5,anti-PD-1+r1L33 n=5). Pvalues determined 
by two-way ANOVA with Tukey's multiple comparison post test (@,d-f.1, 
tumour volume), two-tailed Mann-Whitney test (b, 4, fg, h),and two-sidedlog- 
rank test (a, survival curves). 


substantially with PDCD1 mRNA, which encodes PD-1 (Extended Data 
Fig. 10e). PD-1 expression has been associated with longer survival”, 
which suggests that the IL33-PD-1 axis may positivelyimpactsurvivalin 
individuals with PDAC. Insummary, rlL33-activated ILC2s canamplify 
responses to anti-PD-1 in both tumours that are partially sensitive to 
PD-Land those thatare PD-L-resistant. 


Discussion 

Our results suggest that ILC2s can be activated asa broader strategy 
to prime CD8' T cells in cancers (Extended Data Fig. 10f). However, 
given the tissue-specific phenotypes of ILC2s, more work is required 
to determine whether activating them will have similar effects in can- 
cersarising in different tissues. Given the divergenteffects of ILC2son 


tumour immunity in different tissues, our findings also underscore the 
need for pre-clinical cancer studies to be performed in native organs 
toaccurately reflect the local immune environments. 

Immune checkpoints modulate ILC2 function, but the ability to 
harness ILC2s with immune checkpoint blockade for cancer therapy 
has remained unclear. We have shown that blocking PD-1 on activated 
ILC2s promotes anti-tumour effects, suggesting that ILC2s may par- 
tially contribute to the efficacy of PD-1 pathway blockade in human 
cancers. More broadly, this highlights that differential responses to 
immune checkpoint blockade may depend on tissue-specific factors. 
Refining strategies to identify ILC2sin human cancers will clarify their 
prognosticand predictive potential. 

As activated ILC2s (Extended Data Fig. 10g) and T cells share sev- 
eral immune modulatory molecules and co-exist in human cancers, 
abroader array of checkpoints could be co-targeted on ILC2s and 
T cells in tumours. Further investigations to collectively target ILC2s 
and T cells for cancer immunotherapy are therefore warranted. 
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Methods 


Mice 
CS7BL/6 (wild-type, WT, CD45.2), C57BL/6 CD45.1, Rag2”", Rag2"yc"", 
Batf3",andPdcd1 mice were purchased from Jackson Labs. 33" and 
133°" mice were a gift from M. J. Rosen. C44" Icos"""”" and Rora™”" 
117r** mice wereagift from A.N.J. McKenzieand have been previously 
described’. For all experiments, 6-12-week-old mice were matched 
by age and sex and randomly assigned to specific treatment groups, 
with at least two independent experiments performed throughout. 
PdxI"LSI-Kras*®" LSL-TrpS3*”™"” (KPC) mice have been previously 
described’. Sample sizes for experiments were determined without 
formal power calculations. Animals werebredand maintained inaspe- 
cific pathogen-free animal facility, andall experiments were conducted 
in accordance with an Institutional Animal Care and Use Committee 
(IACUC) approved protocol at Memorial Sloan Kettering Cancer Center 
(MSKCC) and in compliance with all relevant ethical regulations. 


Cell lines and animal procedures 

Alltumour cell lines were derived fromKPC mice. KPC 4662cells from 
PdxI"LSL-Kras*™"LSL-Trps3*”™"” mice (agift from R. H. Vonderheide) 
were transfected with GFP and used for all experiments unless indi 
cated otherwise. KPC 8-1, 18-3, and 52 cells derived from P¢fla"LSL- 
Kras*"LSL-TrpS3*”*"” mice were a gift from C. lacobuzio-Donahue. 
KPC4662cells engineered to express OVA were previously described” 
(a gift from R. H. Vonderheide). All cell lines were authenticated as 
bona fide PDAC cell lines based on histopathologic verification by a 
dedicated pancreatic cancer pathologist. Orthotopic tumours estab- 
lished with KPC 4662 cells were IL33"*" and transiently decreased insize 
in response to anti-PD-1 therapy initiated at the time of implantation 
(anti-PD-1 partial sensitivity). Orthotopic tumours established with 
KPC52cellswereIL33'™ and did not decreasein sizein response to anti- 
PD-1 therapy initiated at the time of implantation (anti-PD-1 resistant). 
Allcell lines were regularly tested using the MycoAlert Mycoplasma 
Detection Kit (Lonza). Orthotopic PDAC tumours were established 
as previously described’. In brief, mice were anaesthetized using a 
ketamine-xylazine cocktail, and a small (7-mm) incision was made 
into the left abdominal side. Tumour cells (10° KPC cells per mouse; 
1.25 x 10° KPC-OVA cells per mouse) were suspended in Matrigel (Bec- 
ton Dickinson), diluted 1:1 with cold phosphate-buffered saline (PBS) 
(total volume of 50 yl), and injected into the tail of the pancreas using 
a26-gauge needle. Successful injection was verified by the appearance 
ofa fluid bubble withoutintraperitoneal leakage. The abdominal wall 
was closed with absorbable Vicryl RAPIDE sutures (Ethicon), and the 
skin was closed with wound clips (Roboz). For subcutaneous PDAC 
tumours, tumour cells (10° KPC cells per mouse; 1.25 x 10° KPC-OVA 
cells per mouse) were resuspended in sterile PBS (Fisher Scientific) and 
implanted subcutaneously. Mice were euthanizedat the indicated time 
points and processed forhistology or flow cytometry. Autochthonous 
KPC mice were euthanized when tumours were detectable by ultra- 
sound. Tumour volumes were measured usingserial ultrasound (Vevo 
2100 Linear Array Imaging and Vivo LABORATORY Version 3.11, Fuji 
Film Visual Sonics) for orthotopic tumours as previously described”. 
Forsubcutaneous tumours, tumour length and width were measured 
every 2-3daysusing calipers, and tumour volumes were calculated as 
volume = length/2 width’, For survival analyses, survival was deter- 
mined by a tumour volume of 2500 mm’ or mouse health requiring 
euthanasia as defined by institutional IACUC guidelines. No mouse 
tumours exceeded IACUC-defined maximal tumour volumes of 22cm’. 
No blinding was performed in experimental mouse interventions, as, 
knowledge of the treatment groups was required. 


Tcell depletion 
CD4" and CD8' cells were depleted by intraperitoneal (i.p.) injectionof 
250 ug of anti-mouse CD4 antibody (clone GK1.5, BioX Cell, InVivoPlus) 


and 250 pig of anti-mouse CD8aantibody (clone2.43, Bio X Cell, InVivo- 
Plus). Control mice were treated with rat IgG2b isotype control (clone 
LTF-2, Bio X Cell, InVivoPlus). Mice were treated daily for 3 days before 
tumour implantation, and then every 3 days for the duration of the 
experiment. CD4" and CD8° T cell depletion was confirmed by flow 
cytometricanalysis of tumours and secondary lymphoid organs (>85% 
depletion). 


ILCdepletion 

ILCs were depleted in Rag2 mice by i.p. injection of 300 1g of anti 
mouse CD90.2 (clone 30-H12, Bio X Cell) on days 0, 1,3, 6, 9, and 13 
following tumour implantation as previously described. ILC2s were 
depleted in Cd4°*/cos"”™* experimental mice and Cd4“"Icos"" con- 
trol mice treated by i.p.injection of diphtheria toxin (Sigma-Aldrich) at 
a dosage of 25 ngper gram of mouse body weight. Mice weretreated the 
day beforetumour implantation and then every other day thereafter for 
atotal of five doses, as previously described”. ILC2depletionwascon- 
firmed by flow cytometric analysis of tumours (Extended Data Fig. Sa). 


Bonemarrow chimaeras 

Bone marrow was removed from CD45.2 congenically labelled donor 
mice, filtered through a 70-mm filter, centrifuged, and resuspended 
insterile PBS toaconcentration of 10° live cells per 200 pil. CD45.1.con- 
genically labelled C57BL/6) recipient mice were irradiated (5.5 Gy «2, 
6 hapart) 24h before bone marrow transplant and were maintained 
on endofloxacin water for 4 weeks after irradiation. A single-cell sus- 
pension of CD45.2 bone marrow chimaera in sterile PBS (10° live cells 
per recipient mouse) was transplanted to each recipient mouse by 
retroorbital injection. Reconstitution was confirmed by flow cytom- 
etry of the peripheral blood at 4 and 8 weeks post transplantation. 
‘Tumour implantation experiments were performed at 12 weeks post 
transplantation. 


Recombinant IL33,1L18, and PD-1 blockade 
ForrlL33, mice were treated with ip. injections of 500 ng of carrier-free 
recombinant mouse IL33(R&D Systems) insterile PBS daily for 7 days, 
and then every 2 days thereafter as previously described’. For rlL18, 
mice were treated with i.p. injection of 2 pg of carrier-free recombi- 
nant mouse IL-18 (R&D Systems) in sterile PBS at days 3, 7,11, and 15 
after tumour inoculation as previously described”. The chimaeric 
anti-mouse PD-Lantibody (4H2) used in this study was engineered asa 
‘mouse IgGlisotype monoclonal antibody (mAb) and was shown tobind 
to CHO transfectants expressing PD-1and to block binding of PD-Lland 
PD-L2to these cells. The affinity of 4H2 for mouse PD-1, determined 
by surface plasmon resonance using PD-1-Fc, was 4.68 x 10° M. The 
antibody was produced and purified at Bristol Myers Squibb (BMS). 
Each batch was certified to have <0.5EU/mg endotoxin and be of >95% 
purity. All dosing solutions were prepared in PBS. Mice were treated 
withi.p. injection of 250 ganti-PD-1 every 2 days. Transientreduction 
in tumour size butsubsequent regrowth while on continuousanti-PD-1 
treatment was defined as a partial response. No reduction in tumour 
size while on continuous anti-PD-1 was defined as resistance. 


Human samples 

Alltissues were collected at MSKCC following study protocol approval 
by the MSKCC Institutional Review Board. Informed consent was 
obtained fromall patients. The study was performed in strict compli- 
ance with all institutional ethical regulations. All tumour samples were 
surgically resected primary PDACs. 


Tissue microarray. Tissue microarrays (TMAs) were constructed from 
tumour and adjacent non-tumour cores from formalin-fixed, paraffin: 
embedded tissue blocks from short-term survivors (n= 45 tumours, 5 
normal tissues) and long-termsurvivors (n=51 tumours, S normal tis- 
sues) of PDACas previously described’. Patient subsets were randomly 


selected to undergo tissue microarray construction. Patients treated 
with neoadjuvant therapy were excluded. All tumours were subjected 
to pathological re-reviewand histological confirmation by twoexpert 
PDAC pathologists before analysis. Long-term survivors were defined 
as patients with overall survival of >3 years fromsurgery and short-term 
survivors as patients with survival>3 months and <1 year from surgery, 
to exclude perioperative mortalities. 1LC2"*" and ILC2!™ were defined 
as greater or lesser, respectively, than the median ILC2 frequency for 
theentire TMA cohort. 


Tumour transcriptomic profiling. Patient subsets were randomly 
selected to undergo transcriptomic profiling as previously described’. 
Patients in the TMA cohortwith tumour tissue available for transcrip- 
tomic assessment were included in analyses in Fig. 1b toallow protein 
confirmation of RNA expression. Extracted RNA was qualified on an 
Agilent BioAnalyzer and quantified by fluorometry (Ribogreen). Prepa- 
ration of RNA for whole-transcriptome expression analysis was done 
using the WT Pico Reagent Kit (Affymetrix). Reverse transcription was 
initiated at the poly-A tail as well as throughout the entire length of 
RNA to capture both coding and multiple forms of non-coding RNA. 
RNA amplification was achieved usinglow-cycle PCR followed by linear 
amplification using 17 in vitro transcription technology. The CRNA 
was then converted to biotinylated sense-strand DNA hybridization 
targets. The prepared target was hybridized to GeneChip Human Tran- 
scriptome Array 2.0 (Affymetrix). Washes were performed using the 
GeneChip Hybridization, Wash and Stain Kit using a Fluidics Station 
450/250. Arrays werescanned using the GeneChip Scanner 3000. Data 
analysis for the array was done using Affymetrix Expression Console 
Software (SST-RMA algorithm to summarize the signal from array 
probesets). Immune cytolytic activity was determined as previously 
described. 


Cellisolation 

Mouse and human PDAC tumours and adjacent pancreata were 
mechanically dissociated and incubated in collagenase (collagenase 
Il for mouse tumours, collagenase IV for human tumours, both 
5mg/ml; Worthington Biochemical Corp., Fisher Scientific), DNase 1 
(0.5 mg/ml; Roche Diagnostics), and Hank’s balanced salt solution 
(Gibco, Fisher Scientific) for 30 min at 37°C. Digestion was then 
quenched with fetal bovine serum (FBS, Life Technologies), and cells 
were filtered sequentially through 100- and 40-um nylon cell strainers 
(Falcon, Fisher Scientific). Lymphnodes were mechanically disassoci- 
ated and filtered through 100- and 40-um nylon cell strainers (Falcon, 
Fisher Scientific) using PBS with 1% FBS (Life Technologies). Spleens 
were mechanically dissociated and filtered through 70- and 40-um 
nylon cell strainers (Falcon, Fisher Scientific) using PBS with 1% FBS, fol- 
lowed by RBC lysis (RBC lysis buffer, Thermo Fisher Scientific). Mouse 
Fe receptors were blocked with FceRIII/II-specific antibody (1 1g per 
110° cells; clone 2.4G2, Bio X Cell). 


ILC2adoptive transfer 

CD45.1 CS7BL/6 or PdcdI orthotopic PDAC mice were treated with 
500 ngof carrier-free recombinant mouse IL33 (R&D Systems) insterile 
PBS daily for0 days. Live, CD45", lineage", CD90", CD2s', ST2' TILC2s, 
were sort-purified to 98% purity at day 10 post-implantation using an 
Aria Cell sorter (BD Biosciences). TILC2s (5% 10° cells) were immediately 
transferred to orthotopic PDAC tumour-bearing Rora"l7r""* CD45.2 
miceon days 7 and 14 post-tumour implantation viai.p. injection. Con- 
trol micereceived equivalent volumes of PBS viai.p. injection. Anti-PD-1 
treatmentin recipient mice wasinitiated on the day of ILC2 cell transfer. 
Tissues were collected at indicated time points. 


Flow cytometry 
Single-cell suspensions were stained using antibody cocktails in 
the dark at 4 °C, washed, and analysed on a FACS LSR Fortessa (BD 


Biosciences). Mouse ILCs were defined as live, CD45", lineage’ (CD3, 
CDS, NK1.1, CD11b, CD1c, CD19, FceR1), CD25", CD127’ cells, as pre- 
viously described". Mouse immune cells were defined as follows: 
ILC2s: live, CD45", lineage’, CD25‘, ST2' cells; central memory T cells 
(Tes): live, CD45", CD3", NKLI-, CD8’, CD62I", CD44"; dendritic 
cells (DC): live, CD45", CD3", NKLI’, Gr’, F4/80°, CD11e’, MHC-II'; 
B cells: live, CD45", CD3", CD19"; T cells: live, CD45", CD3"; CD4" 
T cells: live, CD45", CD3’, CD4"; CD8" T cells: live, CD45", CD3°, CD8"; 
regulatory T cells: live, CD45*,CD3°, C4’, FoxP3*;tumour-associated 
macrophages: live, CD45’, CDIIb’, F4\80°, GRI-; myeloid-derived sup- 
pressor cells (MDSCs): live, CD45", CD3", CDI1b’, F4\80", GRI’. Mouse 
cells were stained with the following antibodies: from Biolegend, 
CD45 (clone 30-F11, Pacific Blue), CD45.1 (clone A20, BV711), NKL1 
(clone PK136, APC), Gr-1 (clone RB6-8CS, BV605), CD103 (clone 2E7, 
BV711); from BD Biosciences, CDS (clone 53-7.3, APC), CDI1c (clone 
HL3, APC), NKL1 (clone PK136, BV605), CD4 (clone RM4-5, BV786), 
CD62L (clone MEL-14, APC), CD19 (clone 1D3, BV510), Ly6C (clone 
AL2I, PerCP-Cy5.5), Ly6G (clone 1A8, AF700), PD-1 (cloneJ43BV605), 
TNF-a (clone MP6-XT22, BVS10), IFN-y (clone XMGI.2, APC-Cy7), 
CD90.2 (clone 53-2.1, BV786), T-bet (clone Q4-46, BV711), RORy-t 
(clone Q31-378, BV786), GATA3 (clone L50-823, PE-Cy7), and IL4 (clone 
IBII, BV6S0); from Thermo Fisher Scientific CD3 (clone 17A2, Alexa 
Fluor 700), CD11b (clone M1/70, APC), CD11b (clone M1/70, PerCP- 
Cy5.5), CD8 (clone 53-6.7, Alexa Fluor 700), CD19 (clone 1D3, Alexa 
Fluor 700), FceR1 (clone MAR-1, APC), F4/80 (clone BM8, PE-CyS), 
CD3 (clone 145-2C11, PE-Cy7), MHC-II (clone MS/114.15.2, Alexa Fluor 
700), CD44 (clone IM7, PerCP-CyS.5), CD127 (clone A7R34, FITC), 
CD25 (clone PC6LS, PerCP-Cy5.5), ILS (clone TRFKS, PE), CD11c (clone 
N418, FITC), ST2 (clone RMST2-2, PE-Cy7), and FOXP3 (clone FJK-16S, 
APC); and from MBL international, SINFEKL tetramer (catalogue 
#TB-SOOL-1, PE). 

Human ILCs were defined as live, CD45", lineage" (CD3, CDS, CD56, 
CDUb, CD1e, CD16, CD19, TCRo/B, FceR1), CD25", CD127' cells as 
previously described’. Human cells were stained with the following 
antibodies: from BD Biosciences, GATA3 (clone 50-823, BV711), T-bet 
(clone 04-46, BV650), RORy-T (clone Q21-559, PE); from Biolegend, 
CRTH2 (clone BMI6, PE-Cy7), CDI1b (clone ICRF44, APC), CDS6 (clone 
NCAMI6,2, BV650), CD25 (clone BC96, PerCP-Cy5.5), CD45 (clone HI30, 
Pacific Blue), TCRa/B (clone P26, APC); from Thermo Fisher Scientific, 
CD16 (clone CBI6, APC), CD11c (clone 3.9, APC), CD127 (clone RDRS, 
FITC), CD3 (clone OKT3, Alexa Fluor 700), ST2 (clone hIL33Rcap, PE), 
CDS (cloneL17F12, APC), CD19 (clone HIB19, AF700), FceR1 (clone AER- 
37, APC). Allsamples for flow cytometry were prospectively collected 
from unselected patients with PDAC. 

To examine intracellular cytokine production, single-cell sus- 
pensions of tumours were stimulated for 6 h ex vivo with phorbol 
12-myristate (PMA, 100 ng/ml) and ionomycin (ng/ml) inthe presence 
of brefeldin A (10 g/ml) (all from Sigma-Aldrich) at 37 °C. Cells were 
then surface-stained, fixed, permeabilized, and stained for cytokine 
productionusing the Fixation and Permeabilization Buffer Kit per the 
manufacturer's recommendations (Invitrogen, Thermo Fisher Scien- 
tific). Appropriate isotype controls were used as indicated. Analysis 
was performed using FlowJo (versions 9 and10, Tree Star). 


Immunohistochemistry 

Tissues were fixed in paraformaldehyde (Fisher Scientific) for24hand 
embedded in paraffin, The tissue sections were deparaffinized with 
EZPrep buffer (Ventana Medical Systems), then antigen retrieval was 
performed with CCI buffer (Ventana Medical Systems). Sections were 
blocked for 30minwith Background Buster solution (Innovex), followed 
by avidin-biotin blocking for min (Ventana Medical Systems). Mouse 
1133 (AF3626, R&D Systems), mouse smooth muscle actin (Abcam), 
and human IL33 (AF3625, R&D Systems) antibodies were applied, and 
sections were incubated for 4h, followed bya 60-min incubation with 
biotinylated rabbit anti-goat IgG (Vector labs), or biotinylated goat 
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anti-rabbit IgG (Vector labs) at 1:200 dilution. Detection was performed 
with a DAB detection kit (Ventana Medical Systems) according to the 
manufacturer's instructions. Any section containing cells demonstrat- 
ing cytoplasmic or nuclear positivity for 1L33 was designated to have 
positive staining. Slides were counterstained with Masson‘strichrome, 
or haematoxylin and eosin, and coverslipped with Permount (Fisher 
Scientific). All histologic sections were evaluated by an independent 
PDAC pathologist. 


Mouse immunofluorescence 

1L33/CD11b/CK19/Ibat immunofluorescence. Multiplex immuno- 
fluorescent staining was performed using a Discovery XT processor 
(Ventana Medical Systems) as described”, 


1L33. First, sections were incubated with anti-mIL33 (R&D Systems, 
catalogue # AF3626, 11g/ml) for 4h, followed by 60 minincubation with 
biotinylated horse anti-goat IgG (Vector Laboratories) atl:200 dilution. 
Detection was performed with Streptavidin-HRP D (part of DABMap 
kit, Ventana Medical Systems), followed by incubation with Tyramide 
Alexa Fluor 488 (Invitrogen) prepared according to the manufacturer's 
instructions with predetermined dilutions. 


CDItb. Next, sections were incubated with anti-CD11b (Abcam, clone 
EPR1544) for Sh, followed by 60 min incubation with biotinylated goat 
anti-rabbit IgG (Vector Laboratories) at1:200 dilution. Detection was 
performed with Streptavidin-HRP D (part of DABMap kit, Ventana 
Medical Systems), followed by incubation with Tyramide Alexa S94 
(Invitrogen) prepared according to the manufacturer's instructions 
with predetermined dilutions. 


CK19. Next, slides were incubated with anti-CK19 (Abcam, clone 
EP1580Y) for 5 h, followed by 60 min incubation with biotinylated 
goat anti-rabbit (Vector Laboratories) at 1:200 dilution. Detection 
was performed with Streptavidin-HRPD (part of DABMap kit, Ventana 
Medical Systems), followed by incubation with Tyramide Alexa Fluor 
546 (Invitrogen) prepared according tothemanufacturer’sinstructions 
with predetermined dilutions. 


bat. Finally, sections were incubated with anti-Ibal (Wako, catalogue 
+#019-19741) for Sh, followed by 60 min incubation with biotinylated 
goat anti-rabbit IgG (Vector Laboratories) at 1:200 dilution. Detection 
was performed with Streptavidin-HRPD (part of DABMap kit, Ventana 
Medical Systems), followed by incubation with Tyramide Alexa 647 
(Invitrogen) prepared accordingto the manufacturer’sinstructionswith 
predetermined dilutions. After staining, slides were counterstained 
with DAPI (Sigma-Aldrich) for 10 min and coverslipped with Mowiol. 


Human immunofluorescence 

Tissue sections were deparaffinized with proprietary LeicaBondbutfer 
(Leica Biosystems), and antigen retrieval was performed with Leica 
Bond ER2 buffer (Leica Biosystems). First, sections wereincubated with 
anti-PD-1 (Cell Marque, clone NAT10S) for 1h, followed by detection with 
Bond Polymer Refine Detection kit (Leica Biosystems) and Tyramide 
Alexa Fluor 488 (Invitrogen). Next, sections were incubated with anti- 
CD3 (DAKO, catalogue##A0452) for 1h, followed by detection with Bond 
Polymer Refine Detection kit (Leica Biosystems) and Tyramide CFS94 
(Biotum). Next, sections were incubated with anti-GATA3 (Cell Marque, 
cloneL 50-823) for 1h, followed by detection with Bond Polymer Refine 
Detection kit (Leica Biosystems) and CF 543 (Biotum). Finally, sections 
were incubated with anti-CD45 (DAKO, clone 2B11 + PD7/26) for 1h, 
followed by detection with Bond Polymer Refine Detection kit (Leica 
Biosystems) and Tyramide Alexa Fluor 647 (Invitrogen). All detections 
were prepared according to the manufacturer's instructions with pre- 
determined dilutions. After staining, slides were counterstained with 
DAPI (Sigma-Aldrich) for 10 min and coverslipped with Mowiol. 


Digital image processing and analysis 

Slides were digitized using Panoramic Flash 250 (3Dhistech, Budapest, 
Hungary) using aZeiss 20%/0.8NA objectiveand custom filters for A488, 
A546, A594, and A647. Each core was exported into a multi-channel 
tiff file and analysed using a custom macro written in FlJl/Image). For 
quantification, each nucleus was segmented using the DAPI channel 
after appropriate processing and background subtraction. Then for 
eachnucleated cell, the presence or absence of the other markers was 
assessed after setting appropriate thresholds for each marker. The 
number of cells with specific combinations of markers was tallied. ILC2s 
were defined as CD45’ CD3" GATA3* nucleated cells, PD-I-expressing 
ILC2s were defined as CD45" CD3" GATA3* PD-I' nucleated cells, and 
PD-L-expressing T cells were defined as CD45" CD3" PD-1' nucleated 
cells. For each patient, the frequency of each cell type as a fraction 
of all nucleated cells was calculated in triplicate cores, followed by 
determination of the mean frequency of triplicate cores to calculate 
the final cellular frequency per patient. 


RNA sequencing 
Mouse. Tissues from mice with orthotopic PDACs (n= 6) were col- 
lected and dissociated into single-cell suspensionsas described above. 
‘Tumour infiltrating leukocytes were positively selected by magneti- 
cally activated cell sorting using mouse CD45 MicroBeads (Miltenyi 
Biotec). Purification of magnetically activated sorted cells was con- 
firmed by flow cytometry and was >95%. RNA was isolated from the 
sorted cells using an RNeasy Plus Mini Kit (Qiagen). Poly(A) capture 
and paired-end RNA-seq were performed by the MSKCC Integrated 
Genomics Core Facility. Specifically, after RiboGreen quantificationand 
quality control by Agilent BioAnalyzer, 500ng of total RNA underwent 
polyA selection and TruSeq library preparation according to instruc- 
tions provided by Illumina (TruSeq Stranded mRNA LT Kit, catalogue 
#RS-122-2102), with eight cycles of PCR. Samples were barcoded and 
runonaHiSeq 4000 ina100 bp/100 bp paired-endrun, usingthe HiSeq 
3000/4000 SBS Kit (Illumina). An average of 83 million paired readswas 
generated per sample. Ribosomal reads represented at most 0.03% of 
thetotal reads generated, and the percentage of mRNA basesaveraged 
76.6%. The expression data set was loaded into Gene Set Enrichment 
Analysis (GSEA) 3.0. Gene set databases for antigen presentation and 
Tcell mediated immunity were selected from MSIGDB v6.1, with afalse 
discovery rate of <0.25 to facilitate exploratory discovery. GSEAwas run 
with 1,000 permutations. Three gene set databases met this threshold: 
GO:0002474 (antigen processing and presentation of peptide antigen 
via MHC class 1); GO: 0002711 (positive regulation of tcell mediated 
immunity); and GSE19825 (naive vs day 3 effector CD8 T cell up). 


Single-cell RNA sequencing 

Library preparation forsingle-cell immune profiling, sequencing, and 
post-processing of the raw data were performed at the Epigenomics 
Coreat Weill Cornell Medicine. 


Single-cell RNA library preparation and sequencing. Single-cell 
suspensions of fluorescence-activated cell (FAC)-sorted ILC2cells from 
pancreatic KPC tumours and mesenteric DLNs from mice treated with 
vehicle, rlL33 alone, or rlL33 + anti-PD-1 were preparedas described 
above. scRNA-seq libraries were prepared according to 10X Genomics 
specifications (Chromium Single Cell V(D)J User Guide PN-1000006, 
10X Genomics). Four independent cellular suspensions (85-90% vi- 
able) ata concentration between 90 and 200 cells/il were loaded onto 
to the 10X Genomics Chromium platform to generate Gel Beads-in- 
Emulsion (GEM), targeting about 2,000 single cells per sample. After 
GEM generation, the samples were incubated at 53°C for 45 min ina 
C1000 Touch Thermal cycler with 96-Deep Well Reaction Module (Bio- 
Rad) to generate polyA cDNA barcoded at the 5’ end by the addition 
ofa template switch oligo (TSO) linked to a cell barcode and Unique 


Molecular Identifiers (UMIs). GEMs were broken, andthesingle-strand 
‘cDNA was cleaned up with DynaBeads MyOne Silane Beads (Thermo 
Fisher Scientific). The cDNA was amplified for 16 cycles (98 °C for 45 
$;98°C for 20 s, 67 °C for 30 s, 72°C for 1h). The quality of the cDNA 
was assessed usingan Agilent Bioanalyzer 2100, obtaining a prod- 
uct of about 1,200 bp. Fifty nanograms of cDNA was enzymatically 
fragmented, end repaired, A-tailed, subjected to a double-sided size 
selection with SPRiselect beads (Beckman Coulter), and ligated to 
adaptors provided in the kit. A unique sample index for each library 
was introduced through 14 cycles of PCR amplification using the in- 
dexes provided in the kit (98 °C for 45 s; 98 °C for 20 s, 54°C for 30s, 
72°C for20s%14 cycles;72°C for I min; heldat4 °C). Indexed libraries 
were subjected toa second double-sided size selection, and libraries 
were then quantified using Qubit fluorometric quantification (Thermo 
Fisher Scientific). The quality was assessed on an Agilent Bioanalyzer 
2100, obtaining an average library size of 450 bp. No treatment samples 
had concentrations below detectable limits, and cDNA amplification 
was done with 18 cycles and sample index with 16 cycles. Libraries were 
diluted to 10 nM and clustered using a NovaSeq600 on a paired-end 
read flow cell and sequenced for 28 cycles on RI (10X barcode and the 
UMIs), followed by 8 cycles of 7 Index (sample Index), and 89 bases 
on R2 (transcript), obtaining about 100 million clusters per sample, 
except for tumours from vehicle-treated mice (clustered at about 10 
million). Primary processing of sequencing images was done using Il- 
lumina’s Real Time Analysis software (RTA). 10X Genomics Cell Ranger 
Single Cell Software suite v3.0.2 (https://support.10xgenomics.com/ 
single-cell-gene-expression/software/pipelines/latest/what-is-cell- 
ranger) was used to perform sample demultiplexing, alignment to 
mouse genomic reference mm10, filtering, UMI counting, single-cell 
5’end gene counting, and quality control using the manufacturer's 
parameters. Data from approximately 11,000 single cells that passed 
quality control were obtained with approximately 41,000 mean reads 
per cell (48% sequencing saturation). 


‘scRNA-seq data processing. The Seurat R package version3.1 pipeline 
was used to identify clusters on combined data sets”. First, individual 
data sets were read into Ras count matrices and converted into Seurat 
objects, selecting on genes expressed in 23 cells and on cells with at 
least 200 detected genes. A standard pre-processing workflow was 
thenused to filter cells based on excluding cells with either more than 
2,500 or fewer than 200 unique genes expressed, and cells with greater 
than 5% mitochondrial gene content. 

Following filtering, the samples were merged, and the gene expres- 
sion measurements or retained cells were log-transformed, normalized 
by total expression per cell, and scaled to 10,000 molecules per cell. 
The top 2,000 highly variable genes across the single cells were then 
identified, and principal component (PC) analysis was conducted. 
After examiningjackstraw and elbow plots, we selected the top 15 PCs 
for clustering using K-nearestneighbour (KNN) clustering with cluster 
resolution setat 0.4, identifying 6-8 clusters in ll samples-combined 
and tumour-combined merged data sets. Nonlinear dimensional reduc- 
tion with UMAP was used to visualize the data sets, also using the top 
15PCs. Differential gene expression for gene marker discovery across 
the clusters was performed using the Wilcoxon rank sumtestas used in 
the Seurat package. Pairwise comparison using Wilcoxon rank sum test 
was performedwith the HolmP value adjustment method to compare 
gene expression between samples. 


Invitroassays 
KPC 4662 cells were cultured for 1 week in a 96-well flat-bottomed 
plate (Falcon) in complete medium: RPMI with L-glutamine (Gibco, 
Thermo Fisher Scientific) with 10% fetal bovine serum (Life Technolo- 
gies), 100 units/ml of penicillin, 100 g/ml of streptomycin, and rlL 33 
at concentrations of 0, 10, 100, and 500 ng/ml. Culture medium and 
cytokines were replenished every 48 h. Viability was measured using 


a colorimetric tetrazolium salt assay (Cell Counting Kit, Dojindo 
Molecular Technologies) per the manufacturer's instructionsand read 
ona Synergy HT Multi-Detection Microplate Reader (Biotek). Cells 
were collected and stained for Annexin V (Thermo Fisher Scientific), 
Ki-67 (clone SoIA1S, Thermo Fisher Scientific), and ST2 (clone RMST2, 
Thermo Fisher Scientific). For all in vitro experiments, 2-3 technical 
replicates were performed per independent experiment. 


In vitro dendritic cell migration assays. Mouse splenic DCs were 
isolated and enriched using a mouse panDC isolation kit according to 
the manufacturer's protocol (Miltenyi Biotech). Flow cytometry was 
used to assess DC purity (>70% CDI1c ’oflive cells). Cells were plated in 
complete RPMI medium at 510° cells/ml with 50 ng/ml of recombinant 
mouse GM-CSF (Biolegend) overnight. Next, chemotaxis of splenic 
DCswas analysed by transwell migration assays. RPMI (600 ul) with or 
without 100 ng/ml of recombinant mouse CCLS (Biolegend) wasadded 
to the lower chambers of a6.5-mm Transwell plate with 5.0-m pore 
polycarbonate membrane inserts (Sigma-Aldrich). RPMI (200 pil) was 
also added to the upper chambers, and plates were allowed to equili- 
brate at 37°C in 5% CO, for 15 min. Splenic DCs (1 x 10° cells in 100 il 
RPMI were then loaded intothe upper chambersand incubated at 37°C 
in 5% CO, for 2h. After incubation, membrane inserts were carefully 
removed, and cells were collected fromthe lower chambers. Migrated 
DCswere incubated with DAPI and CD1lcantibodies for 20 min at 4°C, 
and Precision Count Beads (Biolegend) were added to quantify the 
number of live migrated CD1ic’ cells using flow cytometry according 
tothemanufacturer’s protocol. 


Statistics 

Data are expressed as median. As we observed many statistically sig- 
nificant effects n the data withouta priori samplesizecalculations, no 
statistical methods were used to determine sample size. Comparisons 
between two groups were performed using unpaired Mann-Whitney 
testwith theBenjamini-Krieger-Yekutielifalse discovery approach for 
multiple time point comparisons (two-tailed). Comparisons among 
multiple groups were performed using one-way ANOVA test followed 
byKruskal-Wallis multiple comparison post-test. Comparisons among 
multiplegroups across multiple time points were performed using two- 
way ANOVA test. Correlations between two variables were calculated 
using linear regression. Survival curves were compared by two-sided 
log-rank test. Tumour incidences were compared by x test. All alpha 
levels were 0.05, with P< 0.05 considered asignificant difference. Sta- 
tistical analyses were performed using Prism 7.0 (GraphPad Software). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 


Source code for immune quantification is available in Supplementary 
Data. Bulk RNA-seq dataareavailableunder Gene Expression Omnibus 
(GEO) accession number GSE129388. scRNA-seq data are available 
under GEO accessionnumber GSE136720. Source dataare provided for 
all experiments. All other data are available from the corresponding 
author upon reasonable request. 


23. Donovan, C. etal. Roles for T/B lymphocytes and ILC2s in experimental chronic, 
obstructive pulmonary disease. J, Leukoc Biol. 105, 143-150 (2018). 

28. Evans, R.A. otal Lack of immunoedlting in murine pancreatic cancer reversed with 
‘neoantigen. JCI insight, 88328 (2016). 

25. Sastr S.A. & Olive, K.P. Quantification of murine pancreatic tumors by high-resolution 
Ultasound. Methods Mol. Biol. 80, 249-266 (2013), 

26. Monticelli, L.A. etal. Innate lymphoid cells promote lung-tissue homeostasis after 
infection wit influenza virus. Na. Immunol. 12, 1085-1054 (201). 

27. Ma,Z. etal. Augmentation of immune checkpoint cancer immunotherapy with IL18, Clin. 
Cancer Res. 22, 2969-280 (2016), 


Article 


28 Rooney, M.S. Shukla, S.A, Wu, C. J, Getz, G.& Hacohen,N. Molecular and genetic 
properties of tumors associated with local immune cytolytic activity. Cell 160, 48-61 
(2015), 

29. Yarlin,D. etal. Machine-based method for multiplex in situ molecular characterization of 
tissues by immunofluorescence detection, Sci. Rep. 8, 9534 (2015). 

30. Butler, A. Hoffman, , Smibert,P, Papalexi, €. & Satija,R Integrating single-cell 
transeriptomic data across different conditions, technologies, and species. Nat. 
Biotechnol. 36, 411-420 (2018). 


‘Acknowledgements We thank J. Novak, J. Moore and E. Patterson for editorial assistance; 
8. Medina, G.Vitillo J. hang, , Zeng, Rossi, J, Loo, N. Param, J.Maltbaek, 0. Grbovic-Huezo, 
Y. Senbabaoglu, M. Gigoux, R Giese and S. Budhu for helpful discussions and technical 
assistance; and the Epigenomies Core of Weill Cornell Medical College for technical 
assistance with soRNA-seq. This work was supported by the V Foundation Convergence 
‘Scholar Grant (A.M, 1.0. V.PB,), the Stand Up to Cancer Convergence Award DW. 

PB), the National Cancer institute KI2CA184746-O1Al (VP.B,), the Damon Runyon Clinical 
Investigator Award (VB), the Ben and Rose Cole Pia Foundation Scholar Award (V.PB), the 
‘Sarah Min and Matthew Pincus Pancreatie Cancer Immunatherapy Awatd (VP.), an 
‘administrative supplement to NH P30-CA008748 (S.D.L, V.PB.), NIHROI CA204228, NIH, 
P30CA023108 (SL), Swim Across America and the Ludwig Institute for Cancer Research 
(DI, TM), and the Parker Institute for Cancer Immunotherapy (.D.W. LM). Services by the 
MSKCC Small-Animal Core Facility and Integrated Genomics Core were funded by the 
National Cancer Institute Cancer Center Support Grant (P30 CAO08748.48), Cycle for 
‘Survival and the Marie Josée and Henry R Kravis Center for Molecular Oncology. 


‘Author contributions V.P8. conceived the study.,AM, LL.,LAR, SDL, JOW, RPO. TM. 
and VP.B. designed all the experiments. LAM, LL, LAR, JR..JZ., AR.and PB. performed 
all the experiments. 8.6. assisted with generation of bone marrow chimaeras. L.,ZS.and D.R 
‘analysed the scRNA-seq results. G.A. and U.B. performed the pathologic analyses. .E., and 
DAT. generated and assisted in experiments on autochthonous KPC mice. M, Gururajan 


provided technical assistance with the PD-1 blocking antibody. M, Génen provided statistical 
oversight. JAM, JL LAR. JR. SDL, RPD, TM.and VB. analysedall the data. Allauthors 
interpreted the data. A.M. and V.PB. wrote the manuscript with input from all authors, 


Competing interests V.P8. isa recipient of an immuno oncology translational research grant 
from Bristol-Myers Squibb ands an inventor on a patent application related to work on 
neoantigen modelling. $.0.L. isa member of the scientific advisory board of Nybo 
Pharmaceuticals, and co-founder of Episteme Prognostics. DW. isa consultant for Adaptive 
Biotech, Advaxis, Amgen, Apricity, Array SioPharma, Ascentage Pharma, Astellas, Bayer, 
Beigene, Bristol Myers Squibb, Celgene, Chugai Elucida,EliLilly, F Star, Genentech, Imvag, 
Janssen, Kleo Pharma, Linneaus, Medimmune, Merck, Neon Therapuetics, Ono, Polaris 
Pharma, Polynoma, Psiowus, Puretech, Recepla, Trieza, Sella Lie Sciences, Seramettix, 
Surface Oncology and Syndax; isa recipient of research support from Bristol Myers Squibb, 
‘Medimmune, Merck Pharmaceuticals and Genentech; has equity in Potenza Therapeutics, 
Tizona Pharmaceuticals, Adaptive Biotechnologies, Elucida,Imvag, Beigene, Trieza and 
Linneaus; and has received an honorarium from Esanex. IM. isa consultant for Immunos 
Therapeutics and Pizer: isa co-founder with equity n Imvag Therapeutics: receives research 
funding from Bristol-Myers Squibb, Surface Oncology, Kyn Therapeutics, lfinity 
Pharmaceuticals Inc. Peregrine Pharmaceuticals Inc., Adaptive Biotechnologies, Leap 
Therapeutiesine. and Apree; andi an inventor on patent applications related to work on 
COncolytic Viral Therapy, Aipha Virus Based Vaccine, Neo Antigen Modeling, CD40, GITR, 
(OX40, PD-1and CTLA-4. M. Gururgjan is an employee of Bristol-Myers Squibb and has financial 
interest in the company. 


Additional information, 
Supplementary informations available for this paper at https://do\org/10.1038/s41586.020. 
2015-4 

Correspondence and requests for materials should be addressed to TM. or VB. 

eer review information Nature thanks Richard Locksley and the other, anonymous, 
teviewer(s)for their contribution to the peer review of this work 

Reprints and permissions information is available at http:/www.nature.com/reprints 


la 


re 


oRAQT 


Spleen = 
/ Draining LN [> Tineage 1 
[/>SBesrecas { 
a aa al i 
“| \ | 4 
\ fl _ BB _ 
I} tineage TFG Lineage Z#AO 
c 
ILC-1 inducer cytokines IL.C-2 inducer cytokines IL-3 inducer cytokines 
a a a 2 a 7) GZS age SEP ug SWI gy ZS 
= | 
= fees High lea i: 
3 ow 8 Pe 0: 50 oo ow 
Fehr aw ion es Line, “fiom, 
a High peas High pos High pos peor peas pao Prod 
bs Ss 7 ° 7 or x 
Years Years Years Years Years Years Years 
7 dq. J: 4 
5 ee gle YT 
H 7 7 a oF r Ta 2-8core 
d © 80 
an = 
a’ Ve» S60 
N rae 8 
\ /) Spleen % 
Draining LN B40 
{ C Pancreas 3 
OX@E Timur E 
J \ 520 
\ g 
Af 2 
/ 
} o 
{{ SS @1 183 KPC KPCSPONt 
\\ abe 
n= 8 4 4 7 
Orthotopic 
“i PDAC 
‘di Intratumorat ILCs: Mouse coi te 
£8100: ) ae7|| 5 as7|{) 22685 162 g h 
eee iy all tk EA it iA m2 olsotype 
sah # iA \ 4 Meee wees ee iass 
go || | | | 60. a p10] P0001 
Bn J 1) E a 
5 “LAA \ \ g pogo 
a cDti7 ‘Sca-t bso S72 (IL33R) ay } 
Broo Tala 3) xe | | 
eel iA Bi) Ak 8 a | Xe 
3 ao | | “AY 86 
E ao! | \ y \ | 7015 0 5 10 15 ene: 
Bao! | \ \_ \ Limplantation Days post-implantation in Lin? Lin? 
oe ered bE 9 2 n= 79 9 10 NKT NK™ NK’ 
TR KERGT—“GATAS—«—TRET—=RORVT 82 733 2 yee? 57 87 
3 42 799% 
24 
© Tumour © Pancreas © DLN © Spleen 


Extended Data Fig.1|Seenext page for caption. 
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Extended Data Fig. 1 |Identification of 1L33-dependent ILCsin pancreatic 
cancer. a, Gating strategy to identify human ILCs. The first plot was pre-gated 
onlive(DRAQ7 ) cellsand singlets. Lineage I cocktail: CDS, CD11b, CDilc, CD16, 
FeeR1. Lineage 2 cocktail: CD3, CD19, TCRa/B.ILCs were identified as lineage 
(CDS6° CD25" CD127* cells. FMO, fluorescence minus one. b, Representative 
image of immunofluorescence of ILC2sin tumour tissuemicroarrays 

from short-and long-term PDAC survivors (n=96).Arrows, putative ILC2s. 
¢, Top, overall survival of patients with more (high) or less (low) than the 
‘medianintratumoral mRNA level of ILC-stimulating cytokines, Bottom, 
correlation between expression of ILC-activating cytokines and immune 
cytolytic index (CYT) in short-and long-term survivors of human PDAC. Curves 
were fitby linear regression. n=25. d, Gating strategy toidentify mouseILCs. 
The first plot was pre-gated on live (DRAQ7) cells and singlets. Lineage 
cocktail: CDS, CD11b, CDl1c, FceRI. Lineage 2 cocktail: CD3, CD19. LCs were 


identified as lineage” NK1.I" CD25" CD127, and 1LC2s wereidentified aslineage™ 


NKLI-CD25*ST2' cells. Gating on orthotopic PDAC mice shown. 
¢, Intratumoral ILC frequency in orthotopic PDAC mice established with KPC 
cell ines 81, 18-3, and inautochthonous KPC mice with spontaneous PDAC 
(KPC**™), Composite ILC frequencies from Fig. 1d and others are included for 
comparison (KPC 4662).f, Phenotype of ILCsinPDAC mice.Grey curves, 
isotype controls; numbers, mean fluorescence intensity.g, Expansion kinetics 
of ILCs intissues from PDAC mice. h, Changesin non-ILC cell frequency in 
Rag? PDAC mice treated withanti-CD90.2 or isotype antibodies. Datawere 
collected 14 days (d-f),10 days (h), or atthe indicated time points after tumour 
implantation. n indicates individual mice analysed separately inat least two 
independentexperiments withn>2 per group. Median +s.e.m; horizontal bars 
showmedian. Pvalues determined by two-sided log-rank test (c, top), linear 
regression c, bottom), or two-tailed Mann-Whitney test (g). Pvaluesing 
indicate tumour comparisons oallother tissues. 
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Extended DataFig.2|See nextpagefor caption. 
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Extended Data Fig.2|Host-derived IL 33 activates pancreaticILC2s.a, mRNA 
expression of ILCI-(IL12,1L15,1L18), ILC2- (IL25, 1L33, TSLP), and ILC3-inducer 
cytokines (1L23) and the IL 3 receptor (ST2) in orthotopic PDAC tumours (left) 
and autochthonous PDAC tumoursin KPC mice froma previously published 
‘mRNA microarray (right)".b, Representative IL33immunohistochemistry 
(IHC) of 133*° and 1L33"** human (tissue microarray, n=96) and mouse PDAC 
(n=3 per group).c, Frequency of patients with PDAC demonstratingIL33 
positivity by I]HCina human PDAC tumour microarray. d, Multiplexed 
immunofluorescence for IL33, ductal marker CK19, and myeloid markers CDIIb 
and Ibain mouse PDAC (top). Arrows, 1L33-expressing cells. IL33mean 
fluorescenceintensity (MF) innon-immune (CD45°),immune(CD45"), 
‘macrophage (TAM), and monocytic and granulocytic myeloid-derived 
suppressor cell (M-MDSC and G-MDSC) populationsin tumours fromIL33° 
reporter PDAC mice (bottom). e, Representative IL33 protein expression 
shown byIHCin orthotopic PDAC tumours in 33"" (WT) mice, and non- 
tumour-bearing pancreatain 33 mice (n=3 per group).f ILC frequency (top) 


and cell number (bottom) in organs and DLNs of 133°" and 1133 orthotopic 
PDAC mice. g, Gating and frequency of 4 andILS expressionin intratumoral 
ILCsin 1133" and 133 orthotopic PDAC mice. h, i, ILC2(h)andimmunecell 
frequencies i)in orthotopic Rag2’ and Rag?’ yc” PDAC mice withor without 
treatment with rIL33.j, Frequency of ST2° tumour ILCsin mice with 
subcutaneous (SQ) and orthotopic PDACs. k, Tumoursin orthotopicand 
subcutaneous PDAC mice.1, Tumour weight in//33”" and 33” littermate PDAC 
mice. m, Experimental schema of bone-marrow chimaeras to evaluate 
contribution of haematopoietic cell-derived 1.33 to tumour control.n, 0, 
Haematopoietic cell reconstitution (n) and tumour weight (0)in irradiated 
CD45.1 congenic mice reconstituted with either CD4S.2/33"" or CD45.21133~ 
bone marrow. Data were collected 14 (a,b, d-g,j, k(orthotopic), 1,0), 28(k 
(subcutaneous), or 10 (h, ) days after tumour implantation, Horizontal bars 
show median. nindicates individual mice analysed separately inat least two 
independentexperiments withn>2per group. Pvalues determined by one-way 
ANOVA (a) or two-tailed Mann-Whitney test (d, -h,j,1,0). 
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Extended Data Fig.3| Host-derived IL 33 activates pancreaticT cell 
immunity.a, Gene set enrichment analysis of bulk RNA-seq from purified 
CD45" immune cells from {133° and 1133" PDAC mice. Enrichment plots and 
enrichment scores areshown for three gene sets comparingexpressionin 
133°" to expression in 133 mice(n=3mice per group). FDR, falsediscovery 
rate, b,¢, Gating of CD8' Tcells (b) and frequencies of various immune cell 
types (¢,left) and CD4°T cell lineages (¢, right) in 33%" and 133" orthotopic 
PDAC mice. d, Frequency of T central memory (Tc) cells (CD45"CD3" CDS" 
CD44" CD62L") intumour DLNsandnon-tumour draining distant lymphoid 


organs (inguinal lymph node and spleen) in 133" and 33 orthotopic PDAC 
mice. e, Frequency of CDS" T cellsin subcutaneous PDAC tumours. NK, natural 
killer cells; NKT, natural killer T cells;T,.., fegulatory T cells; MDSC, myeloid- 
derived suppressor cells; DC, dendriticcells. Data were collected 14daysafter 
tumourimplantation orat the time pointsindicated. Median+s.e.m; 
horizontal barsshow median. nindicates individual mice analysed separately 
inatleasttwo independent experiments withn>2per group. Pvalues 
determined by one-way ANOVA (@). 
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Extended Data Fig. 4|1L33 andILCsdonotdirectly inducetumour cell 
death. a, Tumour weightinRag2""and Rag2""yc"" PDAC micetreated with 
vehicle or riL33.b, Representative haematoxylin and cosin-stained sections 
(left) with histologic tumour cell differentiation statusin/(33"" and 133” PDAC 
mice (right).¢, Trichrome stainingin tumours from//33"* and 133 PDAC mice 
(n=3pergroup).d, Immunohistochemistry for smooth muscle actin in 
tumours from1133"" and 1133 PDAC mice (n=3 per group).e, Intratumoral ST2 
expression on KPC cells in /I33"* and 33 orthotopic PDAC mice. , ST2 


expression on live KPC cells followingriL33treatment in vitro (DRAQ7 stains 
dead cells) (n=3 per group).g, KPC cell number, viability, proliferation (Ki-67), 
and apoptosis (annexin) following riL33 treatment in vitro (n=3-6 per group). 
Horizontal bars show median. nin a-e indicates individual mice analysed 
separately inat least two independentexperiments with n>3 per group. ninf.g 
indicates technical replicatesandis representative of at least two independent 
experiments. Pvalue determined by two-tailed Mann-Whitney test (a). 
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Extended Data Fig. 5|ILC2sprimeantigen-specificCD8"Tcells.a, Gatingand 
Frequency of TILC2s in 1LC2-intact (diphtheria toxin (DT)-treated Ca4‘**Icos"*) 
andILC2-depleted (DT-treated Cd4°™*Icos"2™*) mice. b, Gating and frequency 
of OVA-specific CD8"T cellsin spleens from ILC2-intactand 1LC2-depleted 
mice. OVA-specific T cells were detected as SINFEKL-tetramer*cells.c, Gating 
and frequency of Tex Cells (CD45* CD3" CDS" CD44" CD62L’)in tumour, DLNs, 
and spleens from LC2-intact and ILC2-depleted mice. d,ST2expression on 
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CD45‘ CD3"CD8"T cells after tumour implantationin PDAC mice. Data were 
collected 14 daysafter tumour implantation orat thetime points indicated. 
Median + s.e.m; horizontal bars show median. n indicates individual mice 
analysed separately in at least two independent experiments with n>2 per 
group. Pvalues determined by two-tailed Mann-Whitney test (a-¢)and two- 
way ANOVA with Tukey's multiple comparison posttest (d, indicating 
comparison of tumour ILCs toall other groups). 
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Extended Data Fig. 6 |Immunophenotypingin rIL33-treated PDACmice. 

a, Tumour establishment of orthotopic and subcutaneous KPC-OVA PDAC 
tumoursin vehicle (veh) and riL3-treated mice. b, Gating (left) andfrequency 
(right) of L18R1expression on tumour ILCs in subcutaneous (SQ)and 
orthotopic PDAC mice. , Gating (left) and frequency (right) of splenic ILC2s 
following rll 33 treatment in orthotopic PDAC mice.d, Gating (left), frequency 
(middle), and number (right) of TILC2s following riL33 treatment in 
subcutaneous PDACmice.e, Gating (left) and frequency (right) of cytokineand 
PD-Lexpression ontumour CD8' T cells following,iL33 treatmentin orthotopic 
PDAC mice. f, Frequency of immune cellsin vehicle- and rll 33-treated 


orthotopic PDAC mice. g, Gating strategy for identification of CD103*DCs. 
hy Gating (left; tumours) and frequency (right) of ILC2sin tumours and DLNs 
from wild-type (WT) or ILC2-deficient (Rora™"1I74™ PDAC mice following rIL33 
treatment. i, Gating (left) and frequency (right) of PD-I° CD8* Tcellsin tumours 
fromrlL33-treated wild-type (WT) and Bacf3 mice, Datawere collected 6 (a), 
4(b),2(€,€,f),5 (d),7 (h),and3 (i) weeksafter tumour implantation. Horizontal 
bars show median. 1 indicates individual mice analysed separately in at least 
two independent experiments withn>2per group. Pvalues determined by 
X*test (a), two-tailed Mann-Whitney test (all else). 
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Extended Data Fig. 7|scRNA-seq of tumourand DLNILC2sin PDACmice. UMIs (middle), and percentage of normalized reads from mitochondrial genes, 
a, Experimental design forin vivo treatment, purification, and single-cell (right) ineach treatment group (columns), and each tissue (rows). Each dot 
analysis ofILC2s.b,¢, Quality metrics.b, Scatter lotsshowing, for eachcell, _representsa single cell. For each treatment group and tissue, datarepresent 
the relationship between the number of UMIsand thenumber of genes. pooled purified single cells from biological replicates of n =10 (vehicle), 


¢, Violin plots showing the distribution of the number of genes (left), number of _(riL33),and n=5 (anti-PD-1+rll33) PDAC mice, 


ILC2 TFs. 


103 TF 


maeit 


Tumour ILC2s DLN ILc2s 


eat 


Extended Data Fig. 8 | Activated ILC2s from tumoursand DLNshave distinct 
transcriptional features.a-d, Single-cell analysis of 1,634 rlL33-activated 
‘tumourand DLNILC2s (experimental design asin Extended Data Fig, 7a). UMAP 
plots showsingle cells (dots) inanontinear representation of the top 15 
principal components. Expression of ILC2 (Gata3, 1d2, Rora)andILC3 (gene, 
Rore: protein, RORYT) transcription factors (TFs) (a), ILC2surface markers (b), 
andILC clusters and tissues (tumourand DLN) (c). Expression of the ILC-1 


Expression 


Expression level 


Tumour ILG2s DLNILG2s 
n=752 n= 882 


transcription factor Tbx21 (T-bet) was undetectable. d,e, Differentially 
expressed genes by cluster (d) and tissue (e).f, Distribution of CelSexpression 
from ILC2sin tumours and DLNs; olin plots show distribution with minima, 
maxima, and circleindicating median. Each dot ina-erepresentsasingle cell. 
For each treatment groupand tissue, data represent pooled purified single 
cells from biological replicates of n= Sil 33-treated PDAC mice. Pvalueinfby 
two-sided pairwise Wilcoxon rank sum test. 
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Extended DataFig. 9 | See nextpage for caption. 
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Extended Data Fig. 9 | Combined anti-PD-1and r1L33 treatment inducesa 
unique transcriptional profilein TILC2s. a, Expression of coinhibitory 
immune checkpointsin TILC2s in vehicle-treated PDAC mice by scRNA-Seq. 

, Gating and frequency of PD-1'ILC2sin vehicle-and rll 33-treated PDAC mice. 
¢,ILC2 frequency in treated PDAC mice. Corresponding tumour volumes, cell 
number,and scRNA-seqare shownin Fig. 4a-c.d, sCRNA-seq of ILC2sfrom 
treated PDAC mice. Expression of ILCI (gene, Tbx21; protein, T-bet),1LC2 
(Gata3, 1d2, Rora), and 1LC3 (gene, Rorc; protein, Roryt) transcription factors 
(TFs) in purified tumourand DLN ILC2s. Corresponding UMAP plots by cluster 
and treatmentare depicted in Fig. 4c.e~g, Top differentially expressed genes 
bytreatmentand tissue (e), cluster (f), and distribution of expression forselect, 


differentially expressed genes by treatment and tissue (g).h, UMAP plots of 
3,415single TILC2sina nonlinear representation of the top 15 principal 
components. i, Differentially expressed genes in TILC2s by treatment. Each dot 
ind, represents asingle cell; nd-i, for each treatment group and tissue, data 
represent pooled purified single cells from biological replicatesofn=10 
(vehicle), n=5 (€1L33), and n=5 (anti-PD-1++IL33) PDAC mice (number ofsingle 
cells for tumour: vehicle n=28, rlL33 n=752, r1L33 + anti-PD-1.n=2,635; or DLN: 
11L33 n=882, r1L33 + anti-PD-1n=2,725). Violin plots show distribution with 
‘minima, maxima, and circle indicating median. Horizontal bars show median. 
Pvalues by two-tailed Mann-Whitney test (b, ¢) and two-sided pairwise 
Wilcoxon rank sum test (g).. 
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Extended Data Fig. 10 |See next pagefor caption. 
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Extended Data Fig. 10 | Activated TILC2sexpress PD-1and co-exist withPD-1" 
Tcells.a,b, Orthotopic PDAC mice (CS7BL/6 WT, PdedI”, CD4S.1) were treated 
withS00 ng of carrier free rll 33 daily for 10 days (experimental designs shown 
inFig. 4e, f). Live, CD45’, lineage’, CD90", CD25", ST2' TILC2s were sort-purified 
0 98% purity at day 10 post implantation. TILC2s (5* 10° cells) were 
immediately transferred to orthotopic PDAC tumour bearing ILC2-deficient 
(Rora""177F'™") CD45.2 mice on days7 and 14 post-tumour implantation viai.p. 
injection. Control mice received equivalent volumes of PBS viai.p. injection. 

a, Representative plots for TILC2sort purification (top) and postsort purity 
(bottom). b, Representative plots showing PD-1 expression on sort purified 
TILC2s from wild-type and CD4S.1 mice in the experimental designs outlined in 
Fig.4e, f.¢, Survival and intratumoral CD8' T cell frequency of orthotopic KPC 
4662and KPC52PDAC tumours; horizontal barsin eshow median. 

d, Frequency of PD-I'ILC2s (left) and correlation with PD-1°T cells (right) in 


human PDACS.e, Linear regression analysis of 33and POCDImRNA in bulk 
tumour transcriptomes from short-and long-term human PDAC survivors (left) 
and survival association of PD-1'cellsintumour tissue microarrays of short- 
andong-termPDAC survivors (right); high and low defined ashigher or lower 
than the median for the cohort. f, Model linking the IL33-TILC2 axisto T cell 
immunity in PDAC. ©2019, Memorial Sloan Kettering Cancer Center. 

g, Distribution of expression of costimulatory molecules in untreated TILC2s 
by scRNA-seq, Experimental design as shownin Extended Data ig. 7a; data 
represent pooled purified single cells from biological replicatesofn=10 
(vehicle). Data are representative of purity and PD-Lexpressiononsorted 
TILC2sin two independent experiments with n>4 per group (a,b).nand data 
points denote individual mice and patients analysed separately. Pvalues 
determined by two-tailed Mann-Whitney test ©), two-sided log rank test, 

(c.e, survival curves) and linear regression (d,). 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


‘The exact sample size (n) for each experimental group/candition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section, 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value nated 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g, Cohen's d, Pearson's r), indicating how they were calculated 


(Our web collection on statistics for biologists contain articles an many of the polots above 


Software and code 


Policy information about availablity of computer code 


Data collection Flow cytometric data were collected using FACSDiva (BD Biosciences, version 8.0.1), Pathologic slides were digitized using Panoramic 
Flash 250 (3Dhistech, Budapest Hungary) using Zeiss 20x/0.8NA objective and custom filters for A488, AS45, AS94 and A647. 


Data analysis Transcriptomic analyses was done using the Affymetrix Transcription Analysis Console (TAC) Software (version 3.0, Applied Biosystems, 
SST-RMA algorithm) to summarize the signal from array probesets (Ref 14). Flow cytometry data were analyzed on Flowlo (Treestar, 
versions 9.9.6, and 10.4.2). Digital cellular quantification was performed using custom macro written in FUl/Imagel (version 1.52n). For 
RNA sequencing, the expression dataset was loaded into Gene Set Enrichment Analysis (GSEA, version 3.0) to identify biological 
processes that were differentially expressed in experimental groups. Gene sets databases for antigen presentation and T cell mediated 
immunity were selected from MSIGDB (version 6.1). Single cell RNA sequencing analysis was performed with Cell Ranger Single Cell 
Software suite v3.0.2 (10x Genomics) and Seurat R package (version 3.2). lllumina Real Time Analysis Software (1.17) was used for RNA 
sequencing data quality assessment and data preparation. Statistical analysis was performed using Prism (GraphPad software, version 
7.0) 


For manuscripts utilizing custom algorithms ar software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 


We strongly encourage code deposition in a community repository (eg. GitHub). See the Nature Research guidelines for submitting code & software for further information, 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement, This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- Allist of figures that have associated raw data 
- Adescription of any restrictions on data availability 


All raw source data forall experiments included in this study are provided. Bulk RNA sequencing data is available under Gene Expression Omnibus (GEO) accession 


‘number GSE129388, Single cell RNA sequencing data is available under GEO actession number GSE136720. Code for immune quantification is provided in 
‘Supplementary Data. All other data are available from the corresponding author upon reasonable request. 
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Life sciences study design 


All studies must disclose on these paints even when the disclosure is negative 


Sample size _| Sample sizes were determined based on our and other investigators experience with the respective cell ines used. No statistical methods 
were used as we observed many statistically significant effects in the data with the above methods of sample size selection without 2 priori 


sample size calculations. 
Data exclusions No data were excluded from the analyses, 


Replication _Allfindings reported were reproducible and data shown are pooled from >=2 independent experiments, with comparable results in each 
experiment. 


Randomization (6 to 12-week old mice were matched by age and sex and randomly assigned to specific treatment groups. 


Blinding No blinding was performed in experimental mouse interventions as knowledge of the treatment groups was required. To account for 
heterogeneity in tissue samples in human tumor cellular quantification, cellular frequencies for each tumor was calculated as the mean of 3 
independent measurements of 3 randomly sampled biopsies of the same tumor sample. All histopathologic assessments were performed by a 
dedicated pancreatic pathologist who was blinded to the treatment groups. All digital quantification was automated, with all experimental 
groups quantified in an identical fashion 
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Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
DX] Antibodies RIO chip-seq 
COO]DR eukaryotic cell tines ODS Flow cytometry 
Ddl| Palaeontology DX] Mat-based neuroimaging 


[|B Animals and other organisms 


DX] Human research participants 


DAL clinical data 


Antibodies 


Antibodies used Flow cytometry: 
Human (Name Fluarachrome Clone Supplier Catalog Lot Number Dilution) 
CD11b APC ICRA Biolegend 301310 8278347 1:20 
CD11e APC 3.9 ThermoFisher Scientific 17-0116-42 4329675 1:20 
D127 FITC RDRS ThermoFisher Scientific 11-1278-42 1971614 1:20 
CD16 APC .CB16 ThermoFisher Scientific 17-0168-42 2013794 1:20 
CD19 AF700 HIB19 ThermoFisher Scientific S6-0199-42 2031137 1:20 
CD25 PerCP-CyS.5 BC96 Biolegend 302626 8267552 1:20 
D3 Alexa Fluar 700 OKT3 ThermoFisher Scientific 56-0037-42 1984155 1:20 
CD45 Pacific Blue HI30 Blolegend 304029 8256106 1:20 
COS APC L17F12 ThermoFisher Scientific 17-0058-42 1956856 1:20 
CD56 BV650 NCAMI6.2 Biolegend 318344 8282104 1:20 
CRTH2 PE-Cy7 8M16 Biolegend 350118 8287080 1:20 
FeR1 APC AER-37 ThermoFisher Scientific 17-5899-42 4300320 1:20 
GATA3 BV711 150-823 BO Biosciences 565449 6140701 1:20 
ST2 PE hIL33Rcap Thermofisher Scientific 12-9338-42 2041179 1:20 
TcRa/b APC 1P26 Biolegend 306730 8251913 1:20 


Validation 


Mouse (Name Fluorachrome Clone Supplier Catalog Lot Number Dilution) 
D103 BV711 2€7 Biolegend 121435 8266330 1:40 

CD11b APCM1/70 ThermoFisher Scientific 45-0112-82 1929457 1:80 

CD11c APC HL3 BD Biosciences 550261 9129856 1:200 

D127 FITC A7R34 Thermofisher Scientific 11-1271-85 2083447 1:50 

CD19 Alexa Fluor 700 1D3 ThermoFisher Scientific $6-0193-82 4345832 1:80 
CD25 PerCP-CyS.5 PCBI.S Thermofisher Scientific 45-0251-82 4289647 1:80 
D3 Alexa Fluor 700 17A2 ThermoFisher Scientific S6-0032-82 4336536 1:80 
D4 BV786 RM4-5 BD Biosciences 563727 7166502 1:200 

CD44 PerCP-Cy5.5 IM7 ThermoFisher Scientific 45-0441-82 1984139 1:80 
CD45 Pacific Blue 30-F11 Biolegend 103126 8253970 1:200 

CD45.1. 8V711 A20 Biolegend 110739 8245194 1:40 

CDS APC S3-7.3 BD Biosciences 550035 8187632 1:200 

CD62L APC MEL-14 BO Biosciences 561919 7047558 1:80 

DB Alexa Fluor 700 53-6,7 ThermoFisher Scientific 56-O081-82 4329739 1:16 
CD90.2 8V786 53-2.1 BD Biosciences 564365 8186556 1:100 

F4/80 PE-CyS MB Thermofisher Scientific 15-4801-82 4316699 1:80 
FoeR1 APC MAR-1 ThermoFisher Scientific 17-5898-82 2095487 1:160 

FoxP3 APC FIK-16S ThermoFisher Scientific 17-5773-82 4313491 1:20 
Gata3 PE-Cy7 150-823 BD Biosciences S60405 8242846 1:5 

Gr-1 8V60S RB6-8CS Biolegend 108439 8219337 1:40 

IFN-g APC-Cy7 XMGI.2 BD Biosciences 561479 7153814 1:100 

IL BV65O 11811 BD Biosciences 564004 7256960 1:100 

ILS PE TRFKS ThermoFisher Scientific 12-7052-82 4312191 1:80 

Ly6C PerCP-Cy5.5 AL-21 BD Biosciences 560525, 

Ly6G_AF700 18 BD Biosciences 561236 7200759 1:100 

MHC-II Alexa Fluor 700 MS/114.15.2 ThermoFisher Scientific 56-$321-82 1919519 1-330 
NK1.1 BV6SO PK136 BD Biosciences 564143 8162883 1:80 

PD1 BV6OS J43 BD Biosciences 563059 7047607 1:100 

Rorg-t 8V786 031-378 BD Biosciences 564723 7117673 1:200 

S12 PE-Cy7 RMST2-2 Thermofisher Scientific 25-9335-82 2035263 1:80 
Thet BV711 04-46 BD Biosciences 563320 8150719 1:20 

TINF-2 BVS1O MP6-KT22 BD Biosciences 563386 8138693 1:80 

SINFEKL tetramer MBL international N/A TB-5001-1 T1708003 1:50 


Other experiments: 
Anti-alpha smooth muscle Actin Antibody (PE-Cy7) Abcore Inc 144 AC12-0159-17 Lot # not available 1:500- 1:1000 
Biotinylated Goat Anti-Rabbit IgG Vector Laboratories Polyclonal BA-1000 Lot # nat available 1:200 

Biotinylated Horse Anti-Goat IgG Vector Laboratories Polyclonal BA-9500 Lot # not available 1:200 

Biotinylated Rabbit Anti-Goat IgG Vector Laboratories Polyclonal BA-S000 Lot # nat available 1:200 

CD45 Antibody (2811 + PD7/26) DAKO 2811 + PD7/26 NBP2-34287 Lot # not available 1:200 - 1:400 

GATAS (150-823) Mouse Monoclonal Antibody Cell Marque L50-823 390M 0000032957 1:400 

HIL-33 PE MAB R&D Systems/Fisher Scientific Polyclonal AF3625 ACAPO216021 1:13-1:40 

IL-33R (ST2) Monoclonal Antibody (RIMST2-2), PE-Cyanine7, eBioscience ThermoFisher Scientific RMST2 25-9335-82 4298148 
1:80 

InVivoMAb anti-mouse CD16/CD32 BioXCell 2.462 BEO307 636517D1 1:20 

InVivoMAb anti-mouse Thyi.2 (CD90.2) BioXCell 30-H12 BEQOES Lot # not available SOO ug/mouse 

jouse CD4 (11723081) BioXCell GK1.5 BPO003 628316D18 500ug/mouse 

jouse CD8a (11723145) BioXcell 2.43 BPOO61 653047M28 500 ug/mouse 

InVivoPlus rat IgG2b | sotype control (11723013) BioXcell LTF-2 BPO090 62981701 SO0ug/mouse 

Ki-67 Monoclonal Antibody (SolA15) ThermoFisher Scientific SoIA15 46-5698-80 4296883 1:33 

Microglia Marker Iba Antibody Wako Polyclonal 019-19741. Lot # not available 1:200 

Mouse IL-33 Antibody R&D Systems Polyclonal AF3626 PDJ1519052 1:400 

PD-1 (NAT105) Mouse Monoclonal Antibody Ventana Roche NAT105 760-4895 N/A 1:100 

Recombinant Anti-CD11b antibody [EPR1344] Abcam EPR1344 AB133357 Lot # not available 1:1000 

Recombinant Anti-Cytokeratin 19 antibody [EP1580Y] - Cytoskeleton Marker Abcam EP1S80Y ABS2625 GR249900 1:200- 1:500 
D3, Polyclonal, Unconjugated, Affinity isolated Antibody DAKO F7,2.38 AO4S2 Lot # not available 1:400 


All antibodies were validated by the manufacturer and used per their instructions. In our experiments, isotype and/or FMO 
control samples were included. Specific assessments of differential expression in our model system was performed by comparing 
expression characteristics in tumor tissue to non-tumor adjacent normal tissue (adjacent normal pancreas), tumor regional 
lymphoid organs (draining lymph node), and non-tumor lymphoid organs (spleen). 


All antibodies were validated by the manufacturer and used per their instructions. In our experiments, isotype and/or FMO. 
control samples were included. Specific assessments of differential expression in our model system was performed by comparing 
expression characteristics in tumor tissue to non-tumor adjacent normal tissue (adjacent normal pancreas), tumor regional 
lymphoid organs (draining lymph node), and non-tumor lymphoid organs (spleen). 


‘Additional information on validation can be found on the manufacturers’ websites listed below. 
Mouse flow cytometry antibodies: 

BD Biosciences, CD11c APC HL3, # 550261 https://www.bdbiosciences.com/us/reagents/research/antibodies-buffers/ 
immunology-reagents/anti-mouse-antibodies/cell-surface-antigens/apc-hamster-anti-mouse-cd11c-hl3/p/SS0261 


8D Biosciences, NK1.1 BVBSO PK126, # 564143 


https://www_bdbiosciences. com/us/reagents/research/antibodies-buffers/immunology-reagents/anti-mouse-antibodies/cell- 
surface-antigens/bv650-mouse-anti-mouse-nk-11-pk136/p/564143 


BD Biosciences, CD4 BV786 RMA-S, # 563727 https://www.bdbiosciences.com/us/applications/research/t-cellimmunology/ 
th-1-cells/surface-markers/mouse/bv786-rat-anti-mouse-cd4-1m4-5/p/S63727 


BD Biosciences, CD62L APC MEL-14, # 61919 https://www.bdbiosciences.com/us/applications/research/t-cell-immunology/ 
regulatory-t-cells/surface-markers/mouse/apc-rat-anti-mouse-cd62-mel-14/p/S61919 


8D Biosciences, CD19 AF700 103, # 562956 
https://www.bdbiosciences. com/us/applications/research/stem-cell-research/hematopoietic-ster-cell-markers/mouse/ 
rnegative-markers/alexa-fluor-700-rat-anti-mouse-cd19-1d3/p/SS7958 


BD Biosciences, Ly6C PerCP-Cy5.5 AL-21, # 560525 https://www. bdbiosciences.com/us/reagents/research/antibodies-buffers/ 
immunology-reagents/anti-mouse-antibodies/cell-surface-antigens/percp-cy55-rat anti-mouse-y-6c-al-21/p/S60525, 


8D Biosciences, LyG AF7OO 1A8, # 561236 https://www. bdbiosciences.com/us/reagents/research/antibodies-buffers/ 
immunology-reagents/anti-mouse-antibodies/cell-surface-antigens/alexa-fluor-700-rat-anti-mouse-Iy-6g-1a8/p/561236 


BD Biosciences, PD1. 8V60S J43 , # 563059 https:/(mww.bdbiosciences.com/us/applications/research/t-cell-immunology/ 
regulatory-t-cells/surface-markers/mouse//bv605-hamster-anti-mouse-cd279-j43/p/563059 


8D Biosciences, TNF-a 8V510 MP6-XT22, H 563386 https://www. bdbiosciences.com/us/applications/research/t-cell- 
immunology/th-1-cells/intracellular-markers/cytokines-and-chemokines/mouse/bv510-rat-anti-mouse-tnf-mp6-xt22/p/S63386 


8D Biosciences, IFN-g APC-Cy7 XMG1.2, H 561479 https://www. bdbiosciences.com/us/applications/research/t-cellimmunology/ 
th-1-cells/intracellular-markers/cytokines-and-chemokines/mouse/apc-cy7-rat-anti-mouse-in—xmgi2/p/S61479 


BD Biosciences, CD90.2 BV786 53-2.1, # 64365 https://www.bdbiosciences,com/us/reagents/research/antibodies-butfers/ 
immunology-reagents/anti-mouse-antibodies/cell-surface-antigens/bv786-rat-anti-mouse-cd902-53-21/p/564365 


BD Biosciences, Thet BV711 04-46, # 563320 https://www.bdbiosciences.com/us/applications/research/t-cell-immunology/ 
th-1-cells/intracellular-markers/cell-signalling-and-transcription-factors/mouse/bv711-mouse-anti-t-bet-04-46/p/563320 


8D Biosciences, Rorg-t BV786 031-378, # 564723 https://mww.bdbiosciences.com/us/reagents/research/antibodies-buffers/ 
cell-biology-reagents/cell-biology-antibodies/bv786-mouse-antl-mause-rort-q31-378/p/564723 


BD Biosciences, Gata3 PE-Cy7 150-823, # S60405 https://www.bdbiosciences.com/us/applications/research/t-cell-immunology/ 
th-2-cells/intracellular-markers/cell-signalling-and-transcription-factors/human/pe-cy7-mouse-anti-gata3-I50-823/p/56040S 


BD Biosciences, IL4 8V650 11811, # 564004 https://www.bdbiosciences,com/us/reagents/research/antibodies-butfers/ 
immunology-reagents/anti-mouse-antibodies/intracellular-antigens/bv650-rat-anti-mouse-i-4-11b11/p/564004 


8D Biosciences, CDS APC 53-7.3, # §50035 https://www bdbiosciences.com/us/applications/research/t-cell-immunology/ 
regulatory-t-cells/surface-markers/mouse/apc-rat-anti-mouse-cdS-53-73/p/550035 


Biolegend, CD45 Paci 
antibody-3102 


Blue 30-F11, # 103126 https://www. biolegend.com/en-us/products/pacific-blue-anti-mouse-cd4S- 


Biolegend, CD45.1 8V711 A20, # 110739 https://www.biolegend,com/en-us/products/briliant-violet-711-anti-mouse-cd4S- 
antibody-8925 


Biolegend, Gr-1 8V60S RB6-8CS, # 108439 https://www.biolegend,com/en-us/products/brillant-violet-60S-anti-mouse-ly-6g- 
ly-6e-gr-1-antibody-8724 


Biolegend, CD103 BV711 267, # 121435 https://www.biolegend.com/en-us/products/brillant-violet-711-anti-mouse-cd103- 
antibody-14411 


‘ThermoFisher Scientific, CD3 Alexa Fluor 700 17A2, # 56-0032-82 https://www. thermofisher.com/antibody/product/CD3- 
Antibody-clone-17A2-Monoclonal/S6-0032-82 


‘ThermoFisher Scientific, CD19 Alexa Fluor 700 103, # 6-0193-82 https://www.thermofisher.com/antibody/product/CD19- 
‘Antibody-clone-eBiolD3-1D3-Monoclonal/S6-0193-82 


Thetmofisher Scientific, FceR1 APC MAR-1, # 17-S898-82 https://www.thermofisher.com/antibody/product/FceR1-alpha- 
‘Antibody-clone-MAR-1-Manoclonal/17-5898-82 


‘ThermoFisher Scientific, COB Alexa Fluor 700 53-6.7, # S6-0081-82 https://www.thermofisher.com/antibody/product/CDB3- 
Antibody-clone-$3-6-7-Monoclonal/S6-0081-82 


ThermoFisher Scientific ,CO11b APC M1/70, # 45-0112-82 https://www.thermofisher.com/antibody/product/CD11b-Antibody- 
clone-M1-70-Monoclonal/45-0112-82 


ThermoFisher Scientific , F4/80 PE-CyS BMB, # 15-4801-82 https:/Awww.thermofisher.com/antibody/product/Fa-80-Antibody- 
clone-8M8-Monoclonal/15-4801-82 


‘ThermoFisher Scientific , MHC-II Alexa Fluor 700 MS/114.15.2, # 56-5321-82 https://www.thermofisher.com/antibody/product/ 
MHC-Class-IL1-A-L£-Antibody-clone-MS-114-15-2-Monoclonal/S6-5321-82 


‘ThermoFisher Scientific CD44. PerCP-Cy5.5 IM7, # 45-0441-82 https://www.thermafisher.com/antibady/product/CD44- 
Antibady-clone-IM7-Manoclonal/45-0441-82 


ThermoFisher Scientific CD127 FITC A7R34, # 11-1271-85 https://www.thermofisher.com/antibody/product/CD127-Antibody- 
clone-A7R34-Monoclonal/11-1271-85 


‘ThermoFisher Scientific CD25 PerCP-Cy5.5 PC61.S, # 45-0251-82 https://www.thermofisher.com/antibody/product/CD2S- 
Antibody-clone-PC61-5-Manoclonal/4S-0251-82 


‘ThermoFisher Scientific ILS PE TRFKS, # 12-7052-82 https://www.thermofisher.com/antibody/product/IL-S-Antibody-clone- 
TRFKS-Monoclonal/12-7052-82 


ThermoFisher Scientific ST2 PE-Cy7 RMST2-2, # 25-9335-82 https://www.thermofisher.com/antibody/product/IL-33R-ST2- 
‘Antibody-clone-RMST2-2-Monaclonal/25-9335-82 


Thermofisher Scientific FoxP3 APC FIK-165, # 17-5773-82 https://\www.thermofisher.com/antibody/product/FOXP3-Antibody- 
clone-Fik-165-Monoclonal/17-5773-82 


Human flow cytometry antibodies: 
8D Biosciences, GATA3 BV711 LS0-823, # 565449 https:/Awww bdbiosciences.com/us/reagents/research/antibodies-buffers/ 
cell-biology-reagents/cell-biology-antibodies/bv711-mouse-anti-gata3-I50-823/p/565449 


Biolegend, CDS6 8V6SO NCAM16.2, # 318344 https://www_biolegend.com/en-us/products/brilliant-violet-650-anti-human- 
cd56-ncam-antibody-8780 


Biolegend, CD25 PerCP-CyS.5 BC96, # 302626 http://www. biolegend.com/en-us/products/perepcyanineSS-anti-human-cd2S- 
antibody-4231 


Biolegend, CD11b APC ICRF44, # 301310 https://www. biolegend.com/en-us/products/apc-anti-human-cd11b-antibody-765 


Biolegend, CRTH2 PE-Cy7 BMI16, # 350118 https://www-biolegend.com/en-us/products/pe-cy7-anti-human-cd294-crth2- 
antibody-8815 


Biolegend, CD45 Pacific Blue HI30, # 304029 https://www. biolegend.com/en-us/products/pacific-blue-anti-human-cd4s- 
antibody-3331 


Biolegend, TCRa/b APC 1P26, # 306730 https://www-biolegend.com/en-us/products/alexe-fluor-700-anti-human-tcr-alpha-beta- 
antibody-12517 


‘ThermoFisher Scientific, CD3 Alexa Fluor 700 OKT3, # 56-0037-42 https://www.thermofisher.com/antibody/product/CD3- 
‘Antibody-clone-OKT3Monoclonal/S6-0037-42 


ThermoFisher Scientific, CD127 FITC RORS, # 11-1278-42 https://www. thermofisher.com/antibody/product/CD127-Antibody- 
clone-eBioRDRS-Monocional/11-1278-42 


‘ThermoFisher Scientific, CD11c APC 3.9, # 17-0116-42 https://wiw.thermofisher.com/antibody/product/CD11c-Antibody- 
clone-3-9-Monoclonal/17-0116-42 


‘ThermoFisher Scientific, CD16 APC CB16, # 17-0168-42 https://www.thermofisher.com/antibody/product/CD16-Antibody- 
clone-eBioCB16-CB16-Monoclonal/17-0168-42 


ThermoFisher Scientific, ST2 PE hiL33Rcap, # 12-9338-42 https://www.thermofisher.com/antibody/product/IL-33R-ST2- 
Antibody-clone-hiL33Rcap-Monoclonal/12-9338-42 


ThermoFisher Scientific, COS APC L17F12, # 17-0058-42 https://www.thermofisher.com/antibody/product/CDS-Antibody-clone- 
L17F12-Monoclonal/17-0058-42 


‘ThermoFisher Scientific, CD19 AF700 HIB19, # 56-0199-42 https://www.thermofisher.com/antibody/product/CD19-Antibady- 
clone-HIB19-Monoclonal/S6-0199-42 


‘ThermoFisher Scientific, FceR1 APC AER-37, # 17-$899-42 https://www.thermofisher.com/antibody/product/FceR-alpha- 
Antibody-clone-AER-37-CRA1-Monoclonal/17-S899-42 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) Alltumor cell ines were derived from KPC mice that were backcrossed more than 10 generations with C57B1/6 mice, KPC 


Cell line source(s) ‘4662 and KPC 4662 OVA (V6 clone) cells Were derived from Pdx1-Cre;LSL-krasG12D/+LSL-TrpS3R172H/+ (Ref 37). KPC 8-1, 
18-3, and 52 cells derived from Ptfla-Cre;LSL-KrasG12D/+;LSL-TrpS3R172H/+ mice 


‘Authentication All cell lines were authenticated as bonafide pancreatic cancer cell lines. This was based on histopathologic verification by a 
dedicated pancreatic cancer pathologist that these cell lines generate tumors on intra-pancreatic implantation that faithfully 
recapitulate features of both human pancreatic cancers and pancreatic cancers that develop in spontaneous genetically 
engineered mice. 


Mycoplasma contamination _ Cell ines were regularly tested using MycoAlert Mycoplasma Detection Kit (Lonza), None of the cell lines used in this study 
tested positive for Mycoplasma 


Commonly misidentified lines | No commonly misidentified lines were used in this study. 
(See ICLAC register) 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals 5781/6 (wild type, WT, CD45.2), CS781/6 CD45.1, Pded1-/-, Rag2-/-, Rag2-/-gc-/-, and Batf3-/-, mice were purchased from 
Jackson Labs, I33-/- , I33Cit/+ were a gift from M.J, Rosen, and have been previously described (II33-/- PMID: 20937871; 
lI33Cit/+ ). Cd4Cre/+; lcosf-Dtr/+ and Il?rCre/+; Rorafi/fl were a gift from A.N.J, McKenzie and have been previously described 
(Refs. 16). Pdx-Cre; LSL-Kras-G12D; LSL-Trp53R172H/+ (KPC) mice have been previously described (ref 4). For all experiments, 6- 
to 12-week old mice were matched by age and sex and randomly assigned to specific treatment groups, with at least two 
independent experiments performed throughout. Bath male and female animals were utilized. KPC mice were sacrificed when 
tumors were detectable by ultrasound. Animals were bred and maintained in a specific pathogen-free animal facility at Memorial 
Sloan Kettering Cancer Center. 


Wild animals No wild animals were used, 
Field-collected samples _No field-collected samples were used in this study, 
Ethics oversight ‘All animal studies were in accordance with the Institutional Animal Care and Use Committee protocol at Memorial Sloan 


Kettering Cancer Center. 


Note that full information an the approval of the study pratocal must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics All tumor samples were from patients with surgically resected primary pancreatic ductal adenocarcinomas. Patients in the tissue 
microarray cohort have been previously described (Ref. 3). Clinical characteristics of patients in the flow cytometry and 
transcriptomic cohorts are outlined in Supplementary Tables 1 and 2, respectively, and are provided below. 


Flow cytometry cohort (n=25); presented in Supplementary Table 1 

Gender: Male 16 (64%); Female 9 (36%) 

Age, Median (Range): 65 (50-77) 

Tumor Location: Head 20 (80%), Body/Tall § (20%) 

Procedure: Distal Pancreatectomy 5 (20%), Pancreaticoduodenectomy 20 (80%) 
Pathological State: 1A 3 (12%), 1B 2 (8%), IIA 4 (16%), 1B 14 (S6%), Il 2 (89%), IV 0 (0%) 
pT: 13 (12%), 2 6 (24%, 3 16 (64%), 4 0 (0%) 

pN: 0.9 (36%), 114 (56%), 2 2 (8%) 

pM: 0 25 (100%), 1.0 (0%) 

Margin: Positive 5 (20%), Negative 20 (80%) 

Adjuvant Treatment: Yes 19 (76%), No 6 (24%6) 

Neoadjuvant Treatment: Yes 11 (44%), No 14 (56%) 


Transcriptomic cohort (short-term survivors [n=12] vs. long-term survivors [n=13}); presented in Supplementary Table 2 
Male: 6 (50%) vs. 5 (38%) 

Female: 6 (SO) vs. 8 (62%) 

Age, Median (Range): 76 (53-83) vs. 58 (45-88) 
Tumor Location’ 

Head, 8 (67%) vs. 10 (77%) 

Body/Tal, 4 (33%) vs, 3 (23%) 

Procedure: 

Distal Pancreatectomy, 4 (33%) vs. 3 (23%) 
Pancreaticoduodenectomy, 8 (67%) vs. 10 (77%) 
Total Pancreatectomy, 0 (09%) vs.0 (0%) 
Pathological Stage: 

1,0 (0%) vs. 0 (0%) 

11,9 (7596)vs. 12 (9296) 

HI, 1 (89%) vs. 1 (896) 

IV, 2.(17%) vs. 0 (0%) 

pr: 


Recruitment 


Ethics oversight 


1, 0 (0%) vs. 010%) 

2, 0 (0%) vs. 0 (0%) 

3, 10 (839%) vs. 12 (92%) 

4, 2 (1756) vs, 1 (896) 

pN 

0, 5 (42%) vs. 6 (46%) 

1,7 (589%) vs. 7 (54%) 

pM: 

0, 10 (839%) vs. 13 (100%) 
1, 2 (175) vs, 0 (0%) 
Margin: 

Positive, 4 (33%) vs. 1 (8%) 
Negative, 8 (67%) vs, 12 (92%) 
Adjuvant Treatment: 

Yes, 9 (75%) vs. 10 (7796) 
No, 3 (25%) vs. 3 (23%) 
Unknown, 0 (0%) vs. 0 (0%) 


All pancreatic ductal adenocarcinoma patients eligible for surgical resection at Memorial Sloan Kettering Cancer Center were 
recruited to participate in an Institutional Review Board-approved protocol. All patients who provided informed consent had 
samples collected; all study procedures were conducted in strict compliance with all ethical and institutional regulations. As 
patients were only recruited at Memorial Sloan Kettering Cancer Center, there is the potential for institution-specific selection 
bias. We do not believe that this potential bias would impact the results of this study. 


Alltissues were collected at Memorial Sloan Kettering Cancer Center under study protocol #15-149 and was approved by the 
Memorial Sloan Kettering Cancer Center institutional Review Board. Informed consent was obtained for all patients. The study 
‘was in strict compliance with all institutional ethical regulations. All tumor samples were surgically resected primary pancreatic 
ductal adenocarcinomas. 


Note that full information on the approval of the study protocol must also be provided In the manuscript. 


Flow Cytometry 


Plots 


Confirm that: 


Methodology 


Sample preparation 


The axis labels state the marker and fluorachrome used (e.g. CD4-FITC). 
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group! is an analysis of identical markers). 
All plots are contour plots with autliers or pseudocolor plots 


‘Anumerical value for number of cells or percentage (with statistics) is provided. 


Bone Marrow Harvest, 
Bone marrow was harvested from CD45,2 congenically-labeled donor mice, filtered through a 70mm filter, centrifuged, and 
resuspended in sterile PBS to a concentration of 1e8 live cells per 200ul, CD45.1 congenically-labeled C57BL/6! recipient mice 
were irradiated (5.5 Gy X2, 6 hours apart) 24 hours before bone marrow transplant and were maintained on endofloxacin water 
for 4 weeks post irradiation, Single cell suspensions of CD45.2 bone marrow chimera in sterile PBS (1e8 live cells per recipient 
mouse) was transplanted to each recipient mice by retroorbital injection. Reconstitution was confirmed by flaw cytometry of 
the peripheral blood at 4 and & weeks post transplantation, 


ILC2 adoptive transfer 

CD45.1 C578//6 or Pdcd1-/- orthotopic POAC mice were treated with 500 ng of carrier-free recombinant murine 1L33 (R&D 
Systems) in sterile PBS daily for 10 days. Live, CD45+, lineage-, CD90+, CD2S+, ST2+ tumor ILC2s were sort-purified to 98% purity 
at day 10 post-implantation using an Aria Cell sorter (BD Biosciences). 5 x 105 tumor ILC2s were immediately transferred to 
orthotopic PDAC tumor bearing il7rCre/+Rorafi/fl CD4S.2 mice on days 7 and 14 post tumor implantation via ip. injection. 
Control mice received equivalent volumes of PBS via ip. injections, aPD-1 treatment in recipient mice was initiated on the day of 
ILC2 cell transfer 


Human tumor transcriptomic profiling 
Patient subsets were randomly selected to undergo transcriptomic profiling as previously described (Ref 14). Patients in the TMA 
cohort with tumor tissue available for transcriptomic assessment were included in analyses in Figure 1b to allow protein 
confirmation of RNA expression. Extracted RNA was qualified on an Agilent BioAnalyzer and quantified by fluorometry 
(Ribogreen). Preparation of RNA for whole-transcriptome expression analysis was done using the WT Pico Reagent Kit 
(Affymetrix). Reverse transcription was initiated at the poly-A tail as well as throughout the entire length of RNA to capture both 
coding and multiple forms of non-coding RNA. RNA amplification was achieved using low-cycle PCR followed by linear 
amplification using T7 in vitro transcription technology. The cRNA was then converted to biotinylated sense-strand DNA 
hybridization targets. The prepared target was hybridized to GeneChip Human Transcriptome Array 2.0 (Affymetrix). Wash and 
scan was performed using the GeneChip Hybridization, Wash and Stain kit using a Fluidics Station 450/250. Arrays were scanned 
Using the GeneChip Scanner 3000, 


Mouse RNA sequencing 
Tissues from orthotopic PDAC mice (n=6) were harvested and dissociated into single-cell suspension as described above. Tumor- 
infiltrating leukocytes were positively selected by magnetically-activated cell sorting using mouse CD45 MicroBeads (Miltenyi 
Biotec}. Purification of magnetically-activated sorted cells was confirmed by flow cytometry and was >95%. RNA was isolated 
from the sorted cells using an RNeasy Plus Mini Kit (Qiagen). Poly(A) capture and paired-end RNA sequencing were performed by 
the Memorial Sloan Kettering Integrated Genomics Core Facility, Specifically, after RiboGreen quantification and quality control 
by Agilent BioAnalyzer, S00 ng of total RNA underwent polyA selection and TruSeq library preparation according to instructions 
provided by Illumina (TruSeq Stranded mRNA LT Kit, catalog # RS-122-2102), with & cycles of PCR. Samples were barcoded and 
run on a HiSeq 4000 in  100bp/100bp paired-end run, using the HiSeq 3000/4000 $85 Kit (llumina). An average of 83 milion 
paired reads was generated per sample, Ribosomal reads represented at most 0.03% of the total reads generated, and the 
percentage of mRNA bases averaged 76.696. 


Mouse single cell RNA sequencing 
Single-cell suspensions of FACS-purifie ILC2 cells fromn vehicle, IL33 alone and |L33+ PD1-treated pancreatic KPC tumors and 
mesenteric draining lymph nodes were prepared (purity >=98%). Single-cell RNA-seq libraries were prepared according to 10X 
Genomics specifications (Chromium Single Cell V(D)) User Guide PN-1000006, 10x Genomics, Pleasanton, CA, USA). Four 
independent cellular suspensions (85-90% viable) at a concentration between 90-200 cells/ul, were loaded onta to the 10x 
Genomics Chromium platform to generate Gel Beads-in-Emulsion (GEM), targeting about 2000 single cells per sample. After 
GEM generation, the samples were subjected to an incubation at $3°C for 45 min in a C1000 Touch Thermal cycler with 96-Deep 
Well Reaction Module (Bio-Rad, Hercules) to generate polyA cDNA barcoded at the S‘end by the addition of a template switch 
oligo (TSO) linked to a cell barcode and Unique Molecular Identifiers (UMIs). GEMs were broken and the single-strand cDNA was 
cleaned up with DynaBeads MyOne Silane Beads (Thermo Fisher Scientific, Waltham, MA). The cDNA was amplified for 16 cycles 
(98°C for 45 sec; 98°C for 20s, 67°C for 30, 72°C for thr). Quality of the cDNA was assessed using an Agilent Bioanalyzer 2100 
(Santa Clara, CA), obtaining a product of about 1200bp. SOng of CDNA was enzymatically fragmented, end repaired, A-taled, 
subjected to a double-sided size selection with SPRIselect beads (Beckman Coulter, Indianapolis, IN) and ligated to adaptors 
provided in the kit. A unique sample index for each library was introduced through 14 cycles of PCR amplification using the 
indexes provided in the kit (98°C for 45 s; 98°C for 20s, 54°C for 30s, and 72°C for 20 s x 14 cycles; 72°C for 1min; held at 4°C) 
Indexed libraries were subjected a second double-sided size selection, and libraries were then quantified using Qubit 
fluorometric quantification (Thermo Fisher Scientific, Waltham, MA). The quality was assessed on an Agilent Bioanalyzer 2100, 
obtaining an average library size of 4S0bp. No treatment samples had concentrations below detectable limits, cCONA 
amplification was done with 18 cycles, and sample Index with 16 cycles. Libraries were diluted to 10nM and clustered using a 
NovaSeq600 on a pair end read flow cell and sequenced for 28 cycles on Ri (10x barcode and the UMIs), followed by 8 cycles of 
17 Index (sample index), and 89 bases on R2 (transcript), obtaining about 100M clusters per sample, except for tumors from 
Vehicle treated mice which was clustered at about 10M. Primary processing of sequencing images was done using ilumina’s Real 
Time Analysis software (RTA). 10x Genomics Cell Ranger Single Cell Software suite v3.0.2 (https://support. 10xgenomics.com/ 
single-cell-gene-expressian/software/pipelines/latest/what-is-cell-ranger) was used to perform sample demultiplexing, 
alignment ta mouse genomic reference mm10, filtering, UMI counting, single-cell S'end gene counting and performing quality 
control using the manufacturer parameters. Data from approximately 11, 000 single cells that passed quality control were 
obtained with approximately 41,000 mean reads per cell (48% sequencing saturation). 


Flow Cytometry 
Mouse and human PDAC tumors and adjacent pancreata were mechanically dissociated and incubated in collagenase 
(collagenase 1! for murine turnors, collagenase IV for human tumors, both 5 mg/ml; Worthington Biochemical Corp., Fisher 
Scientific, DNAse | (0.5 mg/ml; Roche Diagnostics), and Hank's balanced salt solution (Gibco, Fisher Scientific) for 30 minutes at 
37°C. Digestion was then quenched with fetal bovine serum (FBS, Life Technologies), and cells were filtered sequentially through 
100- and 40-mmn nylon cell strainers (Falcon, Fisher Scientific). Tumors, adjacent pancreata, and lymph nodes were then 
mechanically disassociated and filtered through 100- and 40-mm nylon cell strainers (Falcon, Fisher Scientific) using PBS with 1% 
FBS (Life Technologies). Spleens were mechanically dissociated and filtered through 70-and 40-mm nylon cell strainers (Falcon, 
Fisher Scientific) using PBS with 19 FBS, followed by RBC lysis (RBC lysis buffer, Thermofisher Scientific). Mouse Fc receptors 
were blocked with FceRIli/-specific antibody (1 ug per 1 x 106 cells; 2.462; Bio X Cell 


Instrument Flow cytometry data was collected on a 8D LSR Fortessa (BD Biosciences). Flow cytometry sorting was performed on a BD FACS 
Aria (BD Biosciences). 


Software Flow cytometry data was analyzed using Flowlo (Treestar versions 9.9.6, and 10.4.2) 


Cell population abundance | Tumor-infiltating leukocytes were positively selected by magnetically-activated cell sorting using mouse CD45 MicroBeads 
(Miltenyi Biotec). Purification of magnetically-activated sorted cells was confirmed by flow cytometry and was >95%. 


Gating strategy Mouse ILCs were defined as live, CD45, lineage (CD3, CDS, NK1.1, CD11b, CD11c, CD19, FeeR1), CD25+, CD127+ cells 
(Extended Data Fig. 1d, as previously described Ref. 7), mouse ILC2s were defined as live, CD45+, lineage-, CD25+, ST2+ cells 
(Extended Data Fig. 1, as previously described Ref. 7). Central memory T cells were defined as live CD45+, CD3+, NK1.1-, CD8+, 
CD62l+, CD44+ (Extended Data Fig. Sc). Dendritic cells were defined as live CD45+, CD3-, NK1.1-, Grl-, F4/80-, CD19-, CD11¢+, 
MHC-\I+ (Extended Data Fig, 6g) 


Human ILCs were defined as live CD4S+, lineage- (CD3, CDS, CDS6, CD11b, CD11c, CD16, CD19, TCR?/2, Fe?R1), CD25+, CD127+ 
cells as previously described (Extended Data Fig. 1a, Ref. 7) 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Gene expression and cellidentity controlled 
by anaphase-promoting complex 
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January 2020 


Metazoan development requires the robust proliferation of progenitor cells, the 
identities of which are established by tightly controlled transcriptional networks', 

As gene expressionis globally inhibited during mitosis, the transcriptional programs 
that define cell identity must be restarted in each cell cycle” but how this is 
accomplished is poorly understood, Here we identify a ubiquitin-dependent 
mechanism that integrates gene expression with cell division to preserve cell identity. 
We found that WDRS and TBP, which bind active interphase promoters”, recruit the 
anaphase-promoting complex (APC/C) to specific transcription start sites during 
mitosis. This allows APC/C to decorate histones with ubiquitin chains branched at 
Lysll and Lys48 (K11/K48-branched ubiquitin chains) that recruit p97 (also known as 
VCP) and the proteasome, which ensures the rapid expression of pluripotency genes 


Published online: 19 February 2020 
® Check for updates 


inthenext cell cycle. Mitotic exit and the re-initiation of transcription are thus 
controlled by asingle regulator (APC/C), which provides a robust mechanism for 
maintaining cell identity throughout cell division. 


The self-renewal of stem cells endows organisms with the capacity 
to establish or regenerate their many tissues, but the misregulation 
of self-renewal contributes to tumorigenesis, tissue degeneration or 
ageing®. Although tightly controlled transcriptional networks establish 
the identity of self-renewing stem cells during interphase’, changesin 
chromatinarchitecture and the activity of transcription factors restrict 
the synthesis of messenger RNA (mRNA) during mitosis’. Stem cells 
must therefore restart their gene-expression programs each time they 
enteranew cell cycle"*, whichis facilitated by promoter elements that 
remain unwound during mitosis’, hypersensitive to DNase F*"”, and 
accessible to RNA polymeraselll and transcription factors such as the 
TATA-box binding protein TBP™", How dividing cells retain hallmarks 
of interphase transcription to preserve their identity is incompletely 
understood. 


APC/Csustainsstem cellidentity 

‘To understand how pluripotency is preserved through cell division, 
we fused green fluorescent protein (GFP) to the OCT¢ (also known as 
POUSF1) locus of human embryonic stem (ES) cells. Diploid OCT4-GFP 
human ES cells responded to differentiation cues with an efficiency 
similar to that of their untagged counterparts (Extended Data Fig. 1a, 
b). Using lentiviral infection with pooled short hairpin (sh)RNAs, we 
depleted about 900 enzymes and effectors of ubiquitylation, which 
control cell division and differentiation"; propagated OCT4-GFP 
human ES cells in pluripotency medium, or briefly induced differ- 
entiation by neural conversion; and then deep-sequenced popula- 
tions with low versus high levels of OCT4-GFP (Fig. 1a). shRNAs that 
decreased OCT4-GFP abundance in self-renewing human ES cells 


target pluripotency factors, whereas shRNAs that sustained OCT4~GFP 
expression upon neural conversion deplete proteins that are needed 
for robust differentiation. 

We recovered the positive-control OCT4, as well as known stem- 
cell E3 ligases such as DDB1, TRIM28 and UBRS'*”, as pluripotency 
factors (Fig. 1b, Extended Data Fig. 1c). Consistent with the need for 
human ES cells to preserve genomicand proteomic integrity, weiden- 
tified proteins involved in DNA repair (DBI, RNF168 and USP7) and 
quality-control pathways (BAG6, HUWEI, PSMAL, PSMA6, UBRS and 
UBXN7). Many of the enzymes of the latter pathways bind or produce 
K11/K48-branched ubiquitin chains", which we confirmed in human 
ES cells (Extended Data Fig. 1d). Physiological pairs of E3 ligases and 
deubiquitylases (such as HUWEI and USP7) clustered according to 
their opposingactivities. Importantly, the APC2subunit of APC/Cwas 
required for pluripotency, whereas the counteracting deubiquitylase 
USP44” supported differentiation (Fig. 1b, Extended Data Fig. Ic, e). 
Other subunits of APC/C and APC/C-specific E2 enzymes scored as 
pluripotency factors, with P values that were slightly below our strin- 
gent screen cut-off (Extended Data Fig. 1c). 

We confirmed thatthe depletion of subunits of APC/C, of the mitotic 
coactivator of APC/C (CDC20) or of APC/C-specific E2enzymesinhib- 
ited human ES cell pluripotency, as revealed by decreased levels of 
OCT4 and NANOG (Fig. 1c, Extended Data Fig. 2a-c). Although less pro- 
nounced than itseffects on protein levels, depletion of APC2 reduced 
the abundance of OCT4and NANOG mRNA (Extended Data Fig. 2d). 
Human ES cells arrested in S phase and unable to enter mitosis did 
not require APC/C for pluripotency (Extended Data Fig. 2e), indicat- 
ing that APC/Cacts during cell division. However, it was unlikely that 
APC/Cinhibition interfered with pluripotency simply by stallingmitotic 


"Howard Hughes Medical Institute, University of California at Berkeley, Berkeley, CA, USA. “Department of Molecular and Cell Biology, University of California at Berkoley Berkeley, CA, USA. 
“Department of Molecular Machines and Signaling, Max Planck institute of Biochemistry, Martinsried, Germany. “Department of Biochemistry and Biophysics, University of California at San 
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Fig-1/ APC/Cstabilizes human ES cell identity.a, Schematicofthe 
ultracomplexshRNAscreen. hESC, human€S cell.b, shRNA screen identifies, 
genes that areimportantfor pluripotency.Each dot (n= 886 unique genes) 
representsthe P value ofa gene (two-sided Mann-Whitney Utest, notcorrected 
for multiple hypothesistesting), calculated from comparing the collection of 
shRINAs that target each gene to all negative-control shRNAsmeasured ineach 
subpopulation (low versus high levels of OCT4~GFP). Orange, genesthat 
encode enzymes or effectors of KII/K48 branched-chain synthesis; red, genes 
thatencode deubiquitylases that oppose K1I/K48-specific E3 ligases; blue, 
genes that encode DNA-repair enzymes;and green, positive controls. UFD2is 
also known as UBE4B. Knockdown of genes indicated below and to the leftof 
zero results in lower levels of OCT4~GFP; depletion of genes indicated above 
and to the right of zero maintains or increases the level of OCT4~GFP. 

¢, Western blot of pluripotency markers upon APC/C-subunitknockdownin 
asynchronous Hi humanES cells. This experiment was performed five 
independent times with similar results. siCTRL, control small interfering (si) 
RNA; SiAPC2, siCDC20, siUBE2C and siUBE2S denote siRNAsagainst 

APC2, CDC20, UBE2Cand UBE2S, respectively. siUBE2C/S, siRNA against UBE2C 
and UBE2S.d, Interaction network of APC/C, WDRS and USP44. Valueslistedin 
parenthesesare total spectral counts of tryptic peptides of indicated proteins; 
values separated bya solidus denote proteins that coprecipitate with APC3 
(left) or USP44 (right). 


progression, as loss of the APC/C-specific E2 enzyme UBE2C diminished 
OCT4 and NANOG levels withoutaffecting the G2/M population (Fig. 1c, 
Extended Data Fig. 2f). Collectively, these findings indicated that the 
essential mitotic regulator APC/C also helpsto preserve the stem-cell 
state, identifying APC/C asa strong candidate for maintaining cell 
identity through cell division, 


APC/Cworks with WDRS in human ES cells 

‘We speculated that the identification of APC/C or USP44 substrate adap- 
tors required for pluripotency might pointto ubiquitylated proteins that 
preserve human ES cell identity. Using mass spectrometry, we found 
that USP44—inaddition to known partners~also engaged WDRS, achro- 
matin-associated factor that binds methylated histone H3K4 atactive 
interphase promoters*”” (Fig. 1d). Endogenous APC/C also interacted 
with WDRS during mitosis (Fig.1d), which we confirmed by reciprocal 
purification of WDRS (Extended Data Fig. 3a). Inaddition, mitotic WDRS 
bound the transcription factor TFIID (which includes TBP), as well as 
chromatin remodellersINO80 and CHD1 (Extended Data Fig. 3a). 
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Fig.2| WDRSisan APC/Csubstrate coadaptor.a, Immunoprecipitation (IP) of 
endogenous APC3fromHeLa cells reveals that APC/C binds WDRSand TBPin. 
mitosis. Prometaphase Hela cells were released into freshmediumto restart 
the cell cycle. Thisexperiment was performed three independent times with 
similar results. b, Immunoprecipitation of endogenous WDRS from Hela cells, 
confirms that WDRS associates with APC/Csubunits and TBP in mitosis. This 
experiment was performed three independent times with similar results. 

¢, Sequential immunoprecipitations of APC/Cin complex withFlag-tagged 
WDRS from mitotic HEK293T cells reveal that APC/C-WDRS and TBP forma 
ternary complex. Flag-WDRS was first purified from prometaphase cells, and 
next purified with anti-APC3. This experimentwasperformedonce, 

d, Endogenous APC3immunoprecipitations from control versus WDRS- 
depleted humanES cellsshow thatthe association of APC/C with TBPisbridged 
through WDRS. This experiment was performed twice with similarresults. 
siWDRS, siRNA against WDRS. Asterisk denotes nonspecificband.e, The 
approximately 20 A resolution negative-stain electron-microscopy model 
corroborates the association of WDRS with the catalytic core of APC/C.f, Flag- 
WDRS purified from mitoticHeLa cells containsactive APC/C. Flag-tagged 
wild-type (WT) WDRS or Flag-WDRS(AWIN) were purified from mitoticHeLa 
cells, and incubated with El, UBE2C, UBE2S, ubiquitin, ATPand "S-labeled 
geminin. This experiment was performed two independent times with similar 
results. Autorad,, autoradiography. 


Aswith APC/Cand TFID-TBP", depleting WDRS diminished OCT4 
and NANOG levels in human ES cells (Extended Data Fig. 3b). Human 
ES cells that are unable to enter mitosis did not require WDRS for 
pluripotency (Extended Data Fig. 2e), which suggests that WDRS acts 
during cell division. Consistently, the loss of WDRSin human ES cells 
decreased the levels of K1-linked, as well as K11/K48-branched, ubiq- 
uitin chains—the mitotic products of APC/C'*-to an extent similar to 
that seen after depletion of APC2 (Extended Data Fig. 3b). Asin mouse 
EScells”, loss of WDRS did not affect mitotic duration (Extended Data 
Fig. 3c), but codepletion of WDRS and APC2 caused human ES cells to 
die shortly after exiting mitosis (Extended Data Fig. 3d-g). These find- 
ings suggested that WDRS cooperates with APC/C to ensure human ES 
cell identity and survival, and does not impinge on the role of APC/C 
in controlling cell division. 

Reciprocal immunoprecipitations of endogenous proteins from 
somatic cellsshowed that APC/C, WDRS, and TBP engage each other dur- 
ing early mitosis, when APC/C binds CDC20 (Fig.2a, b). Asimilar mitotic 
increase intheinteraction between APC/Cand WDRS was seenin human 
ES cells (Extended Data Fig, 3h). Sequential affinity purifications revealed 
that APC/C, WDRSand TBPwere part ofthesame complex (Fig. 2c), the 
formation of which depended on WDRS (Fig. 2d). In contrast to APC/C, 
WDRS engaged USP44 also during interphase (Extended Data Fig. 3i 
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Fig.3| APC/C-WDRS decorates histone proteins with K11/K48-branched 
ubiquitin chains. a, Mass spectrometry of WDRS-HHR23B and WDRS- 
UBQLN2trapsidentifies histones as candidate substrates. Trapswere affinity: 
purified from prometaphase (PM) or anaphase (ANA) HeLa cells with low or 
high APC/C activity, respectively. TSC, total spectral counts. b, APC/C-CDC20 
purified from mitotic HeLa S3 cells ubiquitylates recombinant human (Homo 
sapiens, Hs) H2A-H2B dimers. This experiment was performed four 
independent times with similar results.¢, APC/C-WDRS ubiquitylates H2Bin 
polynucleosomes (nuc.) purified fromHeLacellsand isinhibited by the APC/C 
inhibitor EMI. This experiment was performed three independent times with 
similar results.d, APC/C-WDRS ubiquitylates multiple Lys residues in histones, 
asseenwith Lys free ubiquitin (KO). Thisexperimentwas performed two 
independent times with similar results.e, APC/C-dependent ubiquitylation of 


WDRS uses distinct surfaces to recognize WDRS-binding motifs 
(WBMs) and WDRS-interacting (WIN) motifs*. Disrupting the ability 
of WDRS to bind WIN motifs (WDRS(AWIN)) blocked the association of 
WDRS with APC/Cand USP44, but notwith TBP (Extended DataFigs. 3i, 
4a-c). Accordingly, the compound MM-102-which targets the site on 
WDRS that bindsthe WIN motifprevented WDRS from binding APC/C 
(Extended Data Fig. 4d), and WDRS(AWIN) did not sustain human ES 
cell pluripotency (Extended DataFig. 4e). Theability of WDRS to detect 
WBMsisnot required for APC/C recognition, butisneeded to bind TBP 
(Extended Data Fig. 4a, c). 

Crosslinking experiments revealed that WDRS, butnot WDRS(AWIN), 
binds APC/C close to CDC20 and the catalytic site that is composed 
of APC2and APCII (Extended Data Fig. Sa). Usingiin vitro translation, 
we identified APC2 asa specific binding partner of WDRS (Extended 
Data Fig. Sb, c). We confirmed these findings by negative-stain electron 
microscopy, whichshowed that WDRSissituated near CDC20and docks 
against APC2 and APCII (Fig. 2e). 
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H2B requires CDC20 in vitro. APC/C was purified from mitotic HeLacells 
depleted of CDC20 or WDRS. This experiment was performed once. 

£, Ubiquitylation of 128 by APC/Cisdependenton UBE2C and UBE2S, and 
inhibited by EMIL. This experiment was performed two independent times with 
similar results. g, Endogenous H2B is modified with KI1/K48-branched chains, 
asseen by denaturing purification from synchronized Hela cells. This 
experiment was performed three independent times with similar results. STLC, 
S-trityl-cysteine.h, Mitotic KI1/K48 modification of endogenous H2B in 
human S cellsis dependent on UBE2C and UBE2S. This experiment was 
performedtwo independent times with similar results. i, Proteasome 
inhibition stabilizes mitotic K11/K48-modified H2B in HI human ES cells. This 
experiment was performed two independenttimes with similar results. CFZ, 
carfilzomib. 


Despite the proximity of WDRS tothe active site of APC/C, we could 
not detect APC/C-dependent ubiquitylation of WDRS nor did excess 
WDRS prevent the modification of APC/C substrates (Extended Data 
Fig. 6a,b). Instead, mitotic WDRS complexes-which contain APC/C 
(Fig. 2b, Extended Data Fig. 3a)—supported the in vitro ubiquityla- 
tion of canonical APC/C substrates (Fig. 2f, Extended Data Fig. 6c). 
Mitotic WDRS also coprecipitated KI/-linked chains produced in cells 
(Extended Data Fig. 6d), which was dependentupon UBE2S (Extended 
Data Fig. 6e). We conclude that WDRS bindsactive APC/C withoutbeing 
ubiquitylated itself, which suggests that WDRS is a coadaptor that 
delivers APC/C to specific (probably chromatin-bound) substrates. 


APC/C-WDRS polyubiquitylates histones 

To identify substrates of the APC/C-WDRS complex, we used an 
approach that was previously established for SCFE3 ligases”. Wefused 
WDRStotheubiquitin-binding domains of HHR23B or UBQLN2, which 
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Fig. 4| APC/C-dependent ubiquitylation occursat TSSs of humanEScell 
genes..a, Genome browser track of E2F3. MNase ChIP-seq of indicated 

antibodies were performed from mitotic human ES cells. b, KI1is deposited 
atselect TSSs.co-occupied by WDRSin human ES cells, Heat map of co- 
occupied genesat TSSs from MNase ChIP-seq experimentsof indicated 
antibodies, HI human ES cells were collected after STLC treatment (mitosis) 
and after an 8-hrelease (late Gl/S phase). CPM, counts per million. ¢, Genome 
browser track of £2F3 from MNase ChiP-seq of anti-K11 in human ES cells 
throughout amitotic release. d, Flow cytometry analysis of HI human ES cells, 
upon mitotic synchronizationand release into fresh medium (top).Metagene 
analysis of K11-and WDRS-occupied TSSs (middle). Heat map of individual K11- 
and WDRS- occupied TSSs from anti-K11 MNase ChIP-seq experiments 
throughout a mitotic release (bottom). e, Anti-K11 MNase ChIP-qPCR validates 
MNase ChIP-seq findings that K11is deposited only during mitosisin Hihuman 
ES cells. The same extract used in ewasused or this experiment. f, Depletion of 
CDC20 or WDRS causes robust depletionof Kl chainsat select TSSs. g, MNase 
ChiP-seq from HUES64 human ES cells reveals thatendogenous targets of 
APC/C-WDRSare strongly enriched in bindingsites for MYC, OCT4and 
NANOG.h, Loss of APC/C-WDRS function interferes with expression of genes 
marked withK1L-linked chainsin HI human ES cells. Poly(A)-selected RNA was 
purified from asynchronous Hi human€S cells transfected with controlsiRNA 
orsiRNAagainst WDRSfor 48 hand subjected to RNA sequencing. TPM, 
transcripts per million. , Real-time qPCR analysis of nascent RNA reveals 
APC/C-WDRS target genesare reactivated upon mitotic exit dependent on 
WDRS. Mitotic HI human ES cells were treated with or without SO xMMMLO2 
and supplemented with 20 1 MZ:VAD-FMK. Cells were released into fresh 
medium containing DMSO or SO iM M102. Real-time qPCR experiments were 
performed with oligonucleotides spanning intron-exonjunctions. Values 
representthemean of independentreplicates#s.e.m.(n=3for¢=1Smin,n=4 
for ¢=30, 60. and 480 minand.n=5fore=0,120and240 min). 


detect K11/K48-branched chains produced by APC/C"*, and purified 
both constructs under conditions of low or high APC/C activity. Ubiq- 
uitylated substrates were expected to be trapped by both fusions in 
cellswith active APC/C. These experiments identified histonesas likely 
APC/C-WDRS substrates (Fig. 3a). 


Invitro reconstitution using human histone H2A-H2B dimers and 
H3-H4 tetramers, or Xenopus laevis H2A-H2B dimers and actamers, 
revealed efficient APC/C-dependent ubiquitylation of H2A, H2B and 
H3, but not of H4 (Fig. 3b, Extended Data Fig. 7a~c). H2A-H2B dimers, 
octamers and polynucleosomes were also strongly ubiquitylated by 
WDRS-bound APC/Cand by endogenous APC/C purified fromhuman 
ES cells (Fig. 3c, Extended Data Fig. 7b-d). Histone polyubiquityla- 
tion occurred at multiple sites (Fig. 3d), including K120 of H2B-the 
monoubiquitylation of which leads to transcriptional activation, and 
is negatively regulated by USP44™. 

Incontrastto mitotic APC/C, APC/C obtained fromasynchronous or 
S-phase cells did not modify histones (Extended Data Fig. 7e). APC/C- 
dependent polyubiquitylation of histones was also blocked by the 
depletion of CDC20 (themitotic coactivator of APC/C), by the addition 
of the APC/C inhibitor EMI or mutation of the KI of ubiquitin (Fig. 3¢, f, 
Extended Data Fig. 7f, g). H2B ubiquitylation was outcompeted bya 
canonical APC/C substrate, but less so by a D-box mutant substrate 
(Extended Data Fig. 7h), which indicates that histones are recognized 
by the D-box coreceptor composed of CDC20 and APC1O™. 

Denaturing purifications of K11/K48-branched chains revealed abun- 
dant ubiquitylation of endogenous H2B during early mitosis, at atime 
when CDC20is decorated with such conjugates (Fig. 3g). Underscoring 
the role of APC/C, H2B modification with K11/K48-linked chains was 
strongly reduced by UBE2C and UBE2S depletion (Fig. 3h). Ubiquity- 
lated H2B accumulated upon proteasome inhibition (Fig. 3i, Extended 
Data Fig. 7i), consistent with KI1/K48-branched conjugates targeting 
proteins for degradation'*”*. We conclude that APC/C-WDRS modi- 
fies multiple histones with KII/K48-branched ubiquitin chains during 
mitosis. 


APC/Cactsat transcription start sites 

Astotal histone levels did not drop during mitotic exit (Extended Data 
Fig. 8a), we hypothesized that APC/C-WDRS targets histonesat select, 
chromosome locations. To identify this population, we performed 
genome-wide micrococcal-nuclease chromatin immunoprecipitation 
with sequencing (MNase ChIP-seq) analysis of KI/-linked chains, WDRS 
and TBP in prometaphase human ES cells. Because the vast majority of 
KI linkages are assembled during mitosis by APC/C*”, tracking this, 
type of chain enabled us tomonitor APC/C even ifit interacted with its 
targets only transiently. MNase was used, as sonication fragmented 
polymeric ubiquitin chains and reduced the specific ChIP-seq signal 
(Extended Data Fig. 8b). 

Notably, K1L-linked and K11/K48-branched chains (thatis, active 
APC/C) accumulated at specific genes in mitotic human ES cells that 
wereco-occupiedby WDRSand TBP (Fig. 4a, b, Extended DataFig. 8c-e). 
Chromatin-bound KII-linked chains wereabundant during early mitosis 
(when APC/Cis activated by CDC20), but were undetectable during 
late G1 or early S phase, when APC/Cisinactive (Fig. 4c-e). By contrast, 
WDRSand TBP were found at these promotersthroughout thecell cycle 
(Fig. 4b). Depletion of CDC20, UBE2S or WDRS, and chemical inhibition 
of WDRS, strongly reduced K1l-linked chains at APC/C-WDRS target 
genes (Fig. 4f, Extended Data Fig. 8f, g). By heterologous expression of 
CDC20 and WDRS, we showed that mitotic APC/C-WDRS also associ- 
ated with specific genes in somatic cells (Extended Data Fig. 8h). 

‘The majority of APC/C-WDRS target sites were within 100 base pairs 
of the transcription startsite (TSS); this location contains TBP-binding 
sites, as we confirmed for select targets by ChIP with quantitative PCR 
(ChiP-qPCR) (Extended Data Fig. Si, j). Gene ontology (GO) analyses 
revealed that most APC/C-WDRS target genes encode proteins thatare 
involved inribosome function (GO: 0003735, P=1.2«10*)and mRNA 
translation (GO: 0006413, P=2.2 x 10). These genes are among the 
very first to be expressed upon mitotic exit’, dependent upon WDRS 
and MYC**. Accordingly, APC/C-WDRS target genes were strongly 
bound by the stem-cell transcription factors MYC, OCT4 and NANOG 
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(Fig. 4g, Extended Data Fig. 8k), whereas transcription factors linked 
to differentiation did not accumulate at these sites (Extended Data 
Fig. 9). When we compared the set of APC/C-WDRS target genes from 
HEK293T cells with gene-expression profiles, we noticed strong over- 
laps with human ES cell lines (Extended Data Fig. 10a). 

Given the enrichment of APC/C-WDRS at the TSSs of pluripotency 
genes and the requirement for this complex for self-renewal, weasked 
whether APC/C-WDRS controls the transcription of its target genes. 
Notably, depletion of WDRS strongly downregulated only those genes 
that were marked by K1-linked chains, WDRS and TBP during mitosis 
(Fig. 4h, Extended Data Fig. 10b, c). Real-time qPCR analyses of nas- 
cent mRNAs using oligonucleotides that span intron-exon junctions 
showed that APC/C-WDRS target genes were expressed immediately 
upon mitotic exit, dependentonWDRS (Fig. 4i, Extended Data Fig. 10d). 
APC/C-WDRS target genes are expressedat high levels (Extended Data 
Fig.10e), and hence, particularly relianton rapid reactivation after mito- 
sis. Polyubiquitylation by APC/C-WDRS therefore promotes early post- 
mitoticexpression of genes controlled by stem cell transcription factors. 


APC/C recruits p97 and the proteasome 

Consistent with K11/K48-branched chains recruiting the cellular deg- 
radation machinery", the p97 adaptor UBXN7 and proteasome subu- 
nits scored in our screen (Fig. 1b). The p97-UBXN7 complex captured 
K1I/K48-modified H2B in vitro (Extended Data Fig. 10f) and strongly 
bound K11/K48-ubiquitylated H2B in cells (Extended Data Fig. 10g). 
Moreover, p97 inhibition by NMS-873 caused the same strong increase 
inK11/K48-ubiquitylation of H2B as seen with proteasome inhibition 
(Extended DataFig.10h). Both MNase ChiP-seq and ChiP-qPCRexperi- 
ments revealed that p97 and the proteasome were required for the loss of 
ubiquitylated proteinsfrom the TSSsof APC/C-WDRS target genesupon 
mitoticexit (Extended Data Fig. 10i,j). These findings suggest that APC/C- 
WDRS mightactby destabilizinghistones atspecific TSSsduringmitosis. 


Discussion 

Our findings reveal a mechanism for how cell identity is preserved 
through cell division (Extended Data Fig. 10k). WDRS and TBP bind 
promoters of genes transcribed in interphase. When cells enter mito- 
sis, WDRSand TBP remain associated with their targets but, instead of 
recruiting RNA polymerase Il, they deliver APC/C to TSSs demarcated by 
the pluripotency factors MYC, OCT4and NANOG. Atthese TSSs, APC/C 
decorates histones with K11/K48-branched chains, which attract p97 
and the proteasome. We propose that subsequent histone degradation 
‘opens the TSSs for the rapid postmitotic expression of pluripotency 
genes. Asit also triggers mitotic exit”, APC/C therefore tightly coor- 
dinates cell division and gene-expression regulation. 

The newly identified cofactor WDRS binds APC/C through the same 
surface as it uses to engage the MLLI methyltransferase, another regu- 
lator of postmitotic gene expression®. Histone methylation might 
strengthen the interaction of WDRS with promoters, which could facili- 
tate subsequent recruitment of APC/C. WDRS also engagesOCT4, MYC 
and TFID-TBP, all of which bind APC/C-WDRS target genes and have 
vital roles in mitotic bookmarking. WDRS thus appears to orchestrate 
distinct steps in the regulation of mitotic gene expression by mediat- 
ing the recruitment of transcription factors, histone methylationand 
nucleosome destabilization. 

Partial APC/C inhibition in neural progenitors triggered cell differ- 
entiation similar to that noted upon loss of APC/C-WDRSin human ES 
cells. Conversely, cellular reprogramming and somatic-cell nuclear 
transfer are more efficient during mitosis”, at times that coincide 
with APC/C-WDRS-dependent histone ubiquitylation. This further 
implies a role for APC/C-WDRS in pluripotency control, which comes 
with practical implications: if APC/C-WDRSactsin cancer stem cellsas 
inhuman ES cells, combinations of APC/Cand WDRS inhibitors might 
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impede the self-renewal of disease-driving cell populationsand should 
be tested for their efficiency in cancer therapy. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
toallocation during experiments and outcome assessment. 


Mammalian cell culture 

Human embryonic kidney (HEK)293T and HeLa cells were maintained 
in DMEM plus 10% fetal bovine serum. Plasmid transfections were 
performed using polyethylenimine (PEI) at a1:3 ratio of DNA (in 1g) 
to PEI (in pl at al mg mI" stock concentration). siRNA transfections 
were performed using 40 nM of indicated siRNAs and a :400 dilu- 
tion of RNAiMAX transfection reagent (Thermo Fisher, 13778150). 
Lentiviruses were produced in HEK293T cells by cotransfection of 
lentiviral and packaging plasmids using Lipofectamine 2000 trans- 
fection reagent (Thermo, 11668027). Viruses were collected 48h after 
transfection, concentrated using the Lenti-X concentrator (Takara, 
631232), aliquoted, and stored at ~80 °C for later use. HEK293T 
cells were purchased directly from the Berkeley Cell Culture Facil- 
ity (authenticated by short tandem repeat analysis). HeLa cells were 
not authenticated. 

Human ES cells (WiCell, WAO1/H1) were grown in mTeSRI medium 
(StemCell Technologies, 85850) on human-ES-cell-qualified Matrigel- 
coated plates (Corning, 354277) with daily medium change. HI cells 
were passaged by collagenase (StemCell Technologies, 07909) for 
routine maintenance or accutase (StemCell Technologies, 07920) for 
siRNA transfections, lentiviral infections or when single cells were 
required. For siRNA transfections, single-cell suspensions of HI cells 
were generated by accutase treatmentand 2-5 x 10° cells wereseeded 
ona Matrigel-coated well of a 6-well plate with 1.8 ml of mTeSRI con- 
taining 10 pM of Y-27632 (StemCell Technologies, 72308) and a0.2ml 
mixture of indicated siRNAs (ata final concentration of 40 nM) and 
1:400 dilution of RNAiMAX transfection reagent buffered in Opti- 
MEM. For lentiviral infections, single-cell suspensions of HIcells were 
generated by accutase treatment and 1.5-3 x 10° cells wereseeded on 
a Matrigel-coated well of a 6-well plate with 2 ml of mTeSR1 contain- 
ing 10 uM of Y-27632, polybrene (ata final concentration of 6 wgml"), 
and lentiviruses produced from HEK293T cells for 2h. The medium 
was immediately exchanged with 2 ml of fresh mTeSRI containing 
10 1M of Y-27632 only. Human ES cells were drug-selected 24-48 hafter 
infection. HI cells were positive for OCT4 and NANOG expressionand 
karyotype analysis showed no chromosomal anomalies. 

Allcell lines were routinely tested for mycoplasma contamination 
using the MycoAlert mycoplasma detection kit (Lonza, LT07-218). All 
celllines tested negative for mycoplasma. 


Generation of OCT4-eGFP-P2A-PURO* humanES cells 

The OCT4 locus was targeted for gene editing in HI cells by TALE 
nucleases as previously described. Anin-frame fusion, consisting of 
enhanced GFP (eGFP) followed by the self-cleaving P2A peptideand 
the puromycin resistance gene (puromycin N-acetyltransferase), was 
generated atthe C terminus of the OC74locus. Inbrief, single-cell sus- 
pensions of HI cells were generated by accutase treatment and 110” 
cells were resuspended in ice-cold 1 x PBS with 40 j1g of the DONOR 
plasmid and 5 pg each of the TALEN plasmids (T4 and T8). Cells were 
electroporated in a 0.4-cm cuvette at 250 Vand 500 pF with the Gene 
Pulser Il eletroporating system (Bio-Rad). Electroporated cells were 
immediately resuspended in mTeSRI, washed to remove lysed debris, 
and seeded on 2 Matrigel-coated 15-cm plates in mTeSRI containing 
10 WM of Y-27632. Hl cells were selected for 10-14 days with puromycin 
(ata final concentration of 0.5 1g mg) 72 h after electroporation. 
Colonies were manually scored and transferred to fresh plates. Asingle 
allele of the OCT# locus was fused with the eGFP-P2A-PURO* cassette as 
verified by Southern blot analysis (data not shown). Karyotype analysis 
was performed by WiCell. 


Neural conversion ofhuman ES cells 

Neural induction of human ES cells were performed as previously 
described’, using STEMdiff Neural Induction Medium (StemCell 
Technologies, 05839). Single-cell suspensions of HI cells were gener- 
ated by accutase treatment and 1.5 10° cells were seeded in a well of 
6-well plate with 4 ml of STEMdiff neural induction medium containing 
10 uM Y-27632. Cells were treated with daily medium changes, and 
collected when indicated. 


Ultracomplex shRNAscreen 

The shRNA library was constructed as previously described”. In 
brief, the shRNA library was divided into four sublibraries, cloned 
into lentiviral expression vectors and transfected into HEK293T cells 
with TransIT-293 transfection reagent (Mirus, MIR 2700) for virus 
production. Human ES cells were infected with lentiviruses overnight 
and cultured in mTeSR1 for six days or in mTeSR1 for six days fol- 
lowed by STEMdiff neural induction medium for one day. Human ES 
cells were then sorted by fluorescence-activated cell sorting using 
an INFLUX cell sorter (BD) at the Flow Cytometry Core Facility at 
UC Berkeley. Cells were sorted on the basis of the strength of their 
GEP expression into three populations. Sequencing libraries were 
prepared fromsorted cellsas previously described, sequenced on 
aHiSeq 2000 (Illumina) and analysed using previously described 
scripts*. 


Cell synchronization 
HeLacellswere first synchronized in S phase by addition of thymidine 
(ata final concentration of 2mM) for 24h. S-phase cells were washed 
with 1 PBS toremove excess thymidine and released into fresh medium 
(DMEM/10% FBS) for 3h. Toarrest cells in prometaphase, released cells 
were treated with STLC (Sigma, 164739) (ata final concentration of 
5M) for 12-14h. Finally, prometaphase cells werecollected by vigorous 
pipetting, washed with 1x PBS and used for downstream applications, 
including immunoprecipitation assays and/or western blot analyses, 
or frozen in liquid nitrogen and stored at -80 °C forlater use. For cell- 
cycle studies, prometaphase cells were released into fresh medium 
and collectedat the indicated time points. For drug inhibition studies, 
cellswere released into medium containing 2 1M carfilzomib (Selleck, 
PR-171), 20 UM (R)-MG132 (Cayman, 13697) and/or 10 wM NMS-873 
(Sigma, SML1128) for indicated times. For depletion studies, HeLa cells 
were transfected with 40 nM of indicated siRNAs and a1:400 dilution of 
RNAIMAX transfection reagent (Thermo Fisher, 13778150) 24 hbefore 
synchronization. 

Mitotic enrichment of HEK293T cells and HI cells was achieved by 
adding STLC (ata final concentration of 5M) to the culture medium 
for 14-16 h. 


Purification of APC/Cand APC/C-WDRS complexes 

Human APC/C and APC/C-WDRS complexes were purified from HeLa 
extracts synchronized in prometaphase (see ‘Cell synchronization’). 
Topurify APC/C-WDRS, HeLacellswere first PEl-transfected with S yg 
of pCMV 3xFlag-WDRS (per 15-cm plate) for 24 hbefore synchroniza- 
tion. Collected prometaphase pellets were lysed in lysis buffer (20 mM 
HEPES, pH 7.4, 5 mM KCI, 150 mM NaCl, 1.5 mM MgCl, 0.1% Nonidet 
P-40, 1x cOmplete protease inhibitor cocktail (Roche, 04693159001) 
and 1 of benzonase (Millipore, 70746) per 15-cm plate). Detergent 
lysed cells were then subjected to a high-speed spin (20,000g) to 
remove cellular debris and the clarified extract was precleared with 
protein G-agarose resin (Roche, 11719416001). APC/C was purified with 
anti-CDC27 antibody (sc-9972, SCBT) precoupled to protein G-agarose 
resin for 3hat4 °C, and APC/C-WDRS was purified with anti-Flag M2 
affinity resin (Sigma, A220) for 1.5 h at 4 °C. APC/C-coupled beads 
were washed 5x with lysis buffer (minus inhibitors and benzonase) 
beforeuse. 
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Purification of recombinant proteins 

WDRS and WDRS(AWIN) were cloned into a pMAL expression vec- 
tor containing a C-terminal 6xHis tag and expressed in Escherichia 
coli BL21-CodonPlus (DE3)RIL cells. Transformed cells were grown at 
37°C toan optical density at 600 nm (OD 400) of 0.5 in LB broth con- 
taining 100 wg mI ampicillin, 34 1g mI" chloramphenicol and 0.2% 
glucose, chilled on ice for 30 min, induced with 100 pM isopropyl f-D- 
I thiogalactopyranoside (IPTG) for 6 h at 16°C, and collected by cen- 
trifugation. Collected cells were resuspended with lysis buffer (20mM 
HEPES, pH 7.4, 300 mM NaCl, 2mM 2-mercaptoethanol (BME), 1mM. 
EDTA, 10%glycerol, 0.2mgml lysozyme, 1mM phenylmethylsulfonyl 
fluoride (PMSF) and 0.1% Triton X-100), incubated on ice for 30 min, 
sonicated and clarified by high-speed centrifugation. The clarified 
extract was supplemented with 20 mM imidazole and bound toNi-NTA 
resin (Qiagen, R90110) (2 ml ofslurry per11 of bacterial culture) for1h 
at4.°C. The resin was then washed 5x with wash buffer (20 mM HEPES, 
pH7.4,300mM NaCl,2mMBME, 1mMEDTA, 10% glycerol and 20mM 
imidazole) and eluted 2* with elution buffer (20 mM HEPES, pH 7.4, 
300 mM NaCl, 2 mM BME, 1 mM EDTA, 10% glycerol and 300 mMimi- 
dazole). The elutions were pooled, dialysed overnightin dialysis buffer 
(20 mM HEPES, pH7.4,300 mM NaCl, 2mM BME, 1mM EDTA and 10% 
glycerol), concentrated, aliquoted, snap-frozen in liquid nitrogenand 
stored at -80 °C for later use. 

Securin andits variants were cloned intoa pET28 expression vector 
containing an N-terminal 6xHis tag followed by a TEV-protease cleav- 
age site and expressed in LOBSTR BL21(DE3)-RIL cells. Transformed 
cells were grown at 37 °C to an ODyoo Of 0.5 in LB broth containing 
100 pg mI" ampicillin and 34 pg mI" chloramphenicol, chilled on 
ice for 30 min and induced with 100 yM IPTG for 14-16 h at 16°C. 
Induced cells were centrifuged, resuspended in lysis buffer (20mM. 
HEPES, pH 7.4, 300 mM NaCl, 2 mM BME, 10% glycerol, 0.2 mg mI 
lysozyme, I mM PMSF and 0.1% Triton X-100), incubated onice for 
30 min, sonicated and clarified by high-speed centrifugation. The 
clarified extract was supplemented with 20 mM imidazole and bound 
to Ni-NTA resin (2 mlof slurry per 11 of culture) for 1h at 4°C. The 
resin was then washed 5x with wash buffer (20 mM HEPES, pH 7.4, 
300 mM NaCl, 2mM BME, 10% glycerol, 0.1% Triton X-100 and 20 mM 
imidazole) and eluted by TEV cleavage. The eluate was desalted using 
aPD10 column, concentrated, aliquoted, snap-frozen and stored at 
-80 °C for later use. 

p97 was cloned into a pMAL expression vector and expressed in 
BL21-CodonPlus (DE3)RIL cells. Transformed cells were grown at 37 °C 
t0 aN OD5o9 of 0.5 in LB broth containing 100 yg mt ampicillin and 
34 wg mI" chloramphenicol, chilled on ice for 30 min and induced 
with 0.5 mM IPTG overnight at 18 °C. Induced cells were centrifuged, 
resuspended in lysis buffer (20 mM Tris 7.4, 300 mM NaCl, 5% glycerol, 
0.2mg mI" lysozyme, 1 mM PMSF and 0.1% Triton X-100), incubated 
on ice for 30 min, sonicated and clarified by high-speed centrifuga- 
tion. The clarified extract was bound to amylose resin (NEB, E8021) 
(2mlof slurry per 11 of culture) for 45 min at 4°C. The resin was then 
washed 3x with 1x PBS, resuspended in 1x PBS containing 2 mM DTT, 
and stored at 4 °C for up to1 month. Recombinant 6xHis~p47 (also 
knownas NSFLIC) and 6*His-UBXN7 were purified using previously 
described methods®. 


Invitrotranscription and translation 

Allin vitro synthesized substrates were cloned under the SP6 promoter. 
The corresponding plasmids can be found in Supplementary Table 1. 
%S-labelled substrates were generated by incubating: pl (400 ng) of 
plasmid DNA in 20 ul of rabbit reticulocyte lysate (Promega, L2080) 
supplemented with2 lof “S-Met (PerkinElmer, NEGOO9HOOIMC) for 
Ih at 30°C. Reactions were terminated by rapid dilution with 1x PBS. 
%5-labelled substrates were used for in vitro ubiquitylation assays and/ 
or MBP binding studies. 


Invitro ubiquitylation 

Invitro ubiquitylation assays were performed in a0 ul reaction volume: 
0.25 plof10 pME1 (250 nM final), 1 1 of 10 MUBE2C (1M final), 1 plof 
10 wMUBE2S(1uMfinal), 1 p1of10 mgmt ubiquitin (mgmt final) (Bos- 
ton Biochem, U-100H), 111 of100 mM DTT, 1.5 plof energy mix (150mM 
creatine phosphate, 20 mM ATP, 20 mM MgCl, 2mM EGTA, pH to7.5 
with KOH), 2.25 ul of 1 x PBS, 1 11 of 10 x ubiquitylation assay buffer 
(250mM Tris7.5, 500 mM NaCl, and100mM MgCl,)and3,1lof substrate 
(invitro translated or recombinant) were premixedand added to lof 
APC/C- or APC/C-WDRS-purified bed resin (see ‘Purification of APC/C 
and APC/C-WDRS complexes’). Reactions were performed at 30 °C with 
shaking for 30 min, unless noted otherwise. Reactions were stopped 
byadding 2x urea sample buffer and resolved on SDS-acrylamidegels. 
E1, UBE2C and UBE2S were purified as previously described’, Recom- 
binant human H2A-H28 dimers (NEB, M2585), recombinant. laevis 
H2A-H2B dimers and octamers, recombinant human H3-H4 tetramers 
(NEB,M25095), or purified human nucleosomes (EpiCypher, 16-0003) 
were used ata final concentration of S00 nM. 


MBP binding studies 
For in vitro transcription and translation binding assays, 10 1! of 
%S-labelled substrate was diluted down to 400 ul with prechilled 
1x PBS containing 0.1% Nonidet P-40 and mixed with 2 pl of mgmt” 
of MBP-fused bait (see ‘Purification of recombinant proteins’) and 
8 p1 of amylose slurry (NEB, E8021). The binding was performed for 
2hat4 °Cwithmixing, and the amylose resin was subsequently washed 
3xwith1x PBS. The bound prey was eluted with2urea samplebuffer, 
resolved onan DS-acrylamide geland visualizedby a Typhoon scanner. 
For coadaptor-bound p97 binding studies, coadaptor-bound p97 
resin was made by mixing 0.1 ml of p97-coupled amylose slurry (see 
‘Purification of recombinant proteins’) with 0.2 ml of recombinant 
6xHis-p47 or 6xHis-UBXN7 and 0.3 ml of x PBS containing 4 mM 
DTT for 45minat4 °C. Theresin was washed 3« with 1x PBS containing 
2mMDTT and stored at4°C for up to 2weeks. Ubiquitylated H2A-H2B 
dimers (see ‘In vitro ubiquitylation’) were added to 6 pl of coadaptor- 
bound p97 slurry brought up in 0.6 ml of lx PBS, incubated for 20 min 
at 4 °C, washed 5x with 1x PBS, eluted with 2x urea sample buffer and 
resolved on an SDS-acrylamide gel. 


Crosslinking studies 

APC/C complexes were first purified from HeLa cells synchronized 
in prometaphase. Before crosslinking, a 200 uM working stock of 
the sulfhydryl-reactive and homobifunctional crosslinker 1,4-bisma- 
leimidobutane (BMB) was prepared in DMSO and a 20 pM solution of 
recombinant MBP-WDRS was pretreated with tris(2-carboxyethyl) 
phosphine (TCEP) (ata final concentration of mM) in a20 pl reaction 
volume. Ten microlitres of purified APC/C slurry (see Purification of 
APC/Cand APC/C-WDRS complexes’) was mixed with TCEP-treated 
MBP-WDRS (ata final concentration of 2 uM) and BMB (ata final con- 
centration of 20 1M) and incubated for 30 min at 22°C with shaking. 
Reactions were stopped by adding 2x urea sample buffer and resolved 
onSDS-acrylamide gels. 


K11/K48 denaturingimmunopre: 
Denaturing K11/K48-linked ubiquitin immunoprecipitations were 
performed from cells arrested in prometaphase. Three 15-cm plates 
of confluent cells were collected and lysed in equal pellet volume with 
urea lysis buffer (20 mM Tris 7.5, 135 mM NaCl, 10% glycerol, 8 Murea, 
1% Triton X-100, 5 uM carfilzomib (Selleck, PR-171), 10 mMN-ethylma- 
leimide (NEM), 1 x phosSTOP (Roche, 4906837001) and 1x cOmplete 
protease inhibitor cocktail (Roche, 04693159001), rotated for 1h at 
room temperature, sonicated witha microtip sonicator (15 pulses at 
50 amps), diluted 2-fold in dilution buffer (20 mM Tris 7.5, 135 mM 
NaCl, 10% glycerol, 5 1M carfilzomib, 10 mM NEM, 1 x phosSTOP and 


1 cOmplete protease inhibitor cocktail) and clarified forS min at low 
speed (2,400g). Clarified extracts were incubated with 20 yg of anti-K11/ 
K48 bispecific ubiquitin antibody or control normal mouse IgG and 
40 ul of protein G-agarose slurry for 3h at room temperature. Beads 
were washed 10x with dilution buffer, eluted with 2x urea sample buffer, 
and resolved on SDS-acrylamide gels. 


Mass spectrometry 

Mass spectrometry was performed onimmunoprecipitates prepared 
from HEK293T cells. In brief, 20 15-cm plates of HEK293T cells were 
PEl-transfected (if indicated), grown to confluence, synchronized (if 
indicated), collected and lysed in lysis buffer (20 mM HEPES, pH7.4, 
SmMKCI, 150 mM NaCl, 1.5 mM MgCl, 0.1% Nonidet P-40 and 1x cOm- 
plete protease inhibitor cocktail). Lysed extracts were clarified by high- 
speed centrifugation, precleared with protein G-agarose slurry and 
bound to indicated antibodies pre-coupled to protein G-agarose resin 
(for immunoprecipitations of endogenous proteins) or anti-Flag M2 
affinity resin (for immunoprecipitations of overexpressed proteins). 
Immunoprecipitates were then washed and eluted 3x at 30 °C with 
0.5 mg mI" of 3x Flag peptide (Sigma, F4799) buffered in 1x PBS plus 
0.1% Triton X-100. Elutions were pooled and precipitated overnight 
at 4 °C with 20% trichloroacetic acid. Immunoprecipitates were then 
pelleted, washed 3x with an ice-cold acetone/0.1N HCI solution, dried, 
resolubilized in 8M urea buffered in 100 mM Tris8.5, reduced with TCEP 
(ata final concentration of mM) for 20 min, alkylated with iodoaceta- 
mide (at afinal concentration of 10 mM) for 15 min, diluted 4-fold with 
100 mM Tris 85, and digested with 0.5 mg mI" of trypsin supplemented 
with CaCl, (ata final concentration of mM) overnightat37 °C. Trypsin- 
digested samples were submitted to the Vincent). Coates Proteomics/ 
Mass Spectrometry Laboratory at UC Berkeley for analysis. Peptides 
were processed using multidimensional protein identification tech- 
nology (MudPIT) and identified using a LTQ XL linear ion trap mass 
spectrometer. To identify high-confidence interactors, CompPASS 
analysis of the query mass spectrometry result was performed against 
mass spectrometry results from unrelated Flag immunoprecipitates 
performed in our laboratory. 

For TMT labelling, samples were prepared in the same manner as 
previously described”. Following trypsin digestion, however, samples 
were desalted using a C18 column (Agilent, AS57203), dried overnight, 
resuspended in 80 pil of 200 mM HEPES, pH 8.0 and quantified using 
the Pierce Quantitative Colorimetric Peptide Assay kit (Pierce, 23275) 
onamicroplate reader. Peptides were then normalized to equal masses 
in 100 ut volumes with 200 mM HEPES, pH 8. TMT labelling was per- 
formed using the TMTsixplex Isobaric Mass Tagging Kit (Thermo Fisher, 
90066) per the manufacturer's instruction. Labelled peptides were 
combined in equal volumes, desalted, dried and identified using a 
Fusion Lumos mass spectrometer by the Vincent). Coates Proteomics/ 
Mass Spectrometry Laboratory. 


Immunofluorescence microscopy 
Forimmunofluorescence analysis ofneural inductions, HI cellsandH1 
cells undergoing neural conversion were seeded on Matrigel-coated 
96-well platesin mTeSR1 or STEMdiff neural induction medium plus 
10 WM Y-27632 for 24 h, washed with 1x PBS plus mM MgCl and 1mM 
CaCl, fixed with 4% paraformaldehyde buffered in 1x PBS for 15 min, 
permeabilized in 1x PBS plus 0.1% Triton X-100 for 10 min, blocked in 
10% FBS plus 0.1% Triton X-100 for 30 min and stained with indicated 
antibodies diluted in 10% FBS plus 0.1% Triton X-100. Images were taken 
onan Opera Phenix High-Content Screening System (PerkinElmer) 
using a 40x air objectiveand processed using Harmony High Content 
Imaging and Analysis Software (PerkinElmer). 


Live-cell imaging 
H2B-mCherry expressing Hi cells were transfected with indicated 
siRNAs and seeded on Matrigel-coated 8-chamber microscopy slides 


(Lab-Telll, 155409). Twenty-four to forty-eighthoursafter transfection, 
cells wereimaged every 3 min for 12-14 h usinga Zeiss LSM 710 confocal 
microscope with 20x objective. Mitotic cells were identified manually. 


Analysis of cell-cycle progression 

For DNAcontentanalysis, single-cell suspensions were generated with 
trypsin, fixed for 15 min with4% paraformaldehyde buffered in 1x PBS, 
washed with 1x PBS and incubated with 2 ig ml of Hoescht 33342 
buffered in1x PBS for 30 min at room temperaturewith gentle rocking. 
Single cells were filtered through a mesh strainer and analysed using 
an LSRFortessa flow cytometer (Becton Dickinson). Cytometry data 
were processed using the FlowCytometryTools Python package and 
custom-builtPython scripts. 


Sonication and ChiP-qPCR analysis 

Cells were resuspended in1«PBS and fixed at room temperature with 1% 
formaldehyde (Fisher, UNII98) for 10 min or with1.5 mM ethyleneglycol 
bis(succinimidyl succinate) (EGS) for 20 min followed by 1% formalde- 
hyde for an additional 10 min. Crosslinking reactions were quenched 
with 125 mM glycine buffered in 1x PBS for 10 min. Crosslinked cells 
were washed twice with 1x PBS, collected, snap-frozen and stored at 
80°C for later use. Collected pellets were resuspended in sonica- 
tion buffer (50 mM Tris 8.0, 10 mM EDTA, 1% SDS and 1 x cOmplete 
protease inhibitor cocktail), incubated on ice for 15 min and pelleted 
at2,000g. Pellets were washed 4x with sonication bufferand sonicated 
in 12x24-mm tubes (Covaris, 520056) at150 W (peak power) using an 
$220 ultrasonicator (Covaris) with a duty factor of 20 and 200 cycles 
per burst for 30-35 cycles (30s on and 30 s off). Sonicated extracts 
were clarified by high-speed centrifugation, snap-frozen and stored 
at-80 °C for later use. ChIP extracts were diluted 10-fold in dilution 
buffer (20 mM Tris 8.0, 167 mM NaCI, 1 mMEDTA, 1% Triton X-100 and 
1x cOmplete protease inhibitor cocktail), precleared with protein 
G/A-agarose resin and bound overnight to the indicated antibodies 
(Supplementary Table2) at 4 °C. Antibodies were pulled down by addi- 
tion of BSA-blocked protein G/A-agarose resin. Beads were washed 
twice with low salt wash buffer (20 mM Tris 8.0, 150 mM NaCl, 2mM. 
EDTA, 1% Triton X-100 and 0.1% SDS), twice with high salt wash buffer 
(20 mM Tris 8.0, 500 mM NaCl, 2 mM EDTA, 1% Triton X-100 and 0.1% 
SDS), oncewith LiCl buffer (20 mM Tris 8.0, 250 mM LiCl, 1 mMEDTA, 
1% deoxycholate and 1% Nonidet P-40) and twice with 1x TE. Samples 
were eluted twice at 30 °C with 1% SDS buffered in 1* TE. Eluates were 
pooled, treated with RNase A and reverse-crosslinked overnight at 
65°C. Samples were then treated with proteinase, phenol:chloroform 
extracted, isopropanol precipitated and eluted in10 mM Tris 8. Resus- 
pended samples were quantifiedusing the KAPA SYBR FAST Universal 
kit (Kapa Biosystems, KK406) on a QuantStudio 6 Flex Real-Time PCR 
System (Applied Biosystems). ChIP-qPCR primers used in this study 
can be found in Supplementary Table 3. 


Real-time qPCR analysis 

For real-time qPCR analysis, total RNAwas purified from cells using the 
NucleoSpin RNA kit (Macherey-Nagel, no.740955) or with acid phenol 
and reverse-transcribed using the Maxima First Strand cDNA Synthesis 
kit (Thermo Fisher, K1671). Expression levels were quantified using the 
Luna Universal qPCR Master Mix (NEB, M3003) ona QuantStudio 6 Flex 
Real-Time PCR System (Applied Biosystems). Real-time qPCR primers 
used in this study can be found in Supplementary Table 3. 


Sonicationand ChiP-seq analysis 

For sonication and ChiP-seq analysis, samples were prepared as 
described in‘Sonication and ChiP-qPCR analysis. Libraries were con- 
structed by the Functional Genomics Laboratory at UC Berkeley, mul- 
tiplexed and sequencedby the Vincent J. Coates Genomics Sequencing 
Laboratory atUC Berkeley on aHiSeq2500 ora HiSeq4000 (Illumina). 
Alignments for the paired-end or single-read sequencing runs were 
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performed against the hgl9 reference genome using Bowtie2. ChIP 
peaks were called with MACS14. Downstream analyses were performed 
using Bedtools and Deeptools. 


MNase ChiP-seq sample preparation 

For MNase ChiP-seq analysis, human ES cells were collected by 
accutase treatment, washed once with ice-cold 1x PBS and resus- 
pended in I ml of 1x PBS. Single-cell suspensions were crosslinked 
with 1% formaldehyde for 10 min at room temperature, quenched 
with glycine (ata final concentration of 125 mM) for 2 min, washed 
with 1x PBS, snap-frozen in liquid nitrogen and stored at -80 °C for 
later use. Frozen pellets were resuspended in an equal pellet volume 
of lysis buffer (1% SDS, 10 mM EDTA, 50 mM Tris 8.0,1 x cOmplete 
protease inhibitor cocktail and 1x phosSTOP), incubated on ice for 
10 min, diluted 4-fold with dilution buffer (1% Triton X-100, 130 mM 
NaCl, 20 mM Tris 8.0, 2.5 mM CaCl,, 1x cCOmplete protease inhibi- 
tor cocktail and 1x phosSTOP), digested with 150 units of MNase 
(Worthington, LS004798) per 200 pl of pellet volume for 5 min at 
37 °C, quenched with 6 mM EDTA and 6 mMEGTA, spun at 20,000gto 
remove debris, aliquoted, snap-frozenin liquid nitrogen and stored 
at -80 °C for later use. MNase-digested chromatin was precleared 
with protein-G dynabeads (Thermo, 10003D) and bound to indi- 
cated antibodies overnight at 4 °C. Antibodies were immunoprecipi- 
tated by addition of BSA-blocked protein-G dynabeads. Beads were 
washed twice with low salt wash buffer (20 mM Tris8.0, 150 mM NaCl, 
2mMEDTA, 1% Triton X-100 and 0.1% SDS), twice with high saltwash 
buffer (20 mM Tris 8.0, 500 mM NaCl, 2 mM EDTA, 1% Triton X-100 
and 0.1% SDS), once with LiC! buffer (20 mM Tris 8.0, 250 mM LiCl, 
1mM EDTA, 1% deoxycholate and 1% Nonidet P-40), and twice with 
1x TE. Samples were eluted twice at 30 °C with 1% SDS buffered in 
1x TE. Eluates were pooled, treated with RNase A and reverse- 
crosslinked overnight at 65 °C. Samples were then treated with pro- 
teinase K, phenol:chloroform extracted, isopropanol precipitated 
and eluted in 10 mM Tris 8. 


MNase ChIP-seq library construction 

Purified DNA (see‘MNase ChiP-seq sample preparation’) was quanti- 
fied using a Fragment Analyzer (Agilent). Twenty-five nanograms of 
purified DNA was resuspended up to $0 pilin water. Ten microlitres 
of T4 DNA ligase buffer (NEB, B0202), 4 pl of 10 mM dNTPs, 5 plof T4 
DNA polymerase (NEB, M0203), 1 11 of Klenow DNA polymerase (NEB, 
M0210), 5 lof T4 DNA polynucleotide kinase (NEB, M0201) and 25 yl 
of water were added to the diluted input DNA and incubated at 25°C 
for 30 min. Samples were purified with Ampure XP beads (Beckman, 
A36881) and resuspended in 32 1 of water. Five microlitres of buffer 2 
(NEB, B7002), 111 of10 mM dATP, 3 pl of Klenow fragment (NEB, M0212) 
and9 ul of water were added tothe end-repaired DNA andincubatedat 
37°C for 30 min. Samples were purified with Ampure XP beads (Beck- 
man, A36881) and resuspended in 23 pl of water. Five microlitres of 
Truseq Y adaptors for paired-end sequencing (custom-made), 5 pl of 
10x ligase buffer (NEB, B0202), 1.5 ul of T4 DNA ligase (NEB, M0202) 
and 15.5 ul of water were added to the 3’-adenylated DNA and incubated 
at room temperature for 1h. Samples were purified with Ampure XP 
beads (Beckman, A36881) and resuspended in 30 ut! of water. Three 
microlitres of adaptor-ligated DNA was used for PCR amplification 
(KAPA HiFi master mix, KK201). 


MNase ChIP-seq and analysis 

MNase ChiP-seq samples (see‘ MNase ChiP-seq library construction’) 
were multiplexed and sequenced by the Vincent). Coates Genomics 
‘Sequencing Laboratory at UC Berkeley on a HiSeq4000 (Illumina). 
Alignments for the single-read sequencing runs were performedagainst 
thehgl9 reference genome using Bowtie2. ChIP peaks werecalled with 
MACSI4. Downstream analyses were performed using Bedtools and 
Deeptools. 


RNA-sequencing sample preparation and analysis 

Total RNA was purified from cells with TRIzol (Thermo, 15596026) and 
digested with TURBO DNase (Thermo, AM2238). Total RNA was poly(A)- 
selected and sequencing libraries were constructed using the KAPA 
mRNA HyperPrep kit (KK8580) by the Functional Genomics Labora- 
toryatUC Berkeley. Libraries were sequenced by the Vincent). Coates 
Genomics Sequencing Laboratory at UC Berkeley on a HiSeq4000 
(illumina). Gene-expression analysis was performed using Kallisto 
against hgl9 as the reference genome. 


Bioinformatics 

Identified ChIP peaks were subjected to bioinformatic analyses. GO 
enrichmentanalyses were performed using DAVID 6.8 (https://david. 
nciferf.gov). Comparison with SAGE data was performed using the 
CGAP-SAGE feature of DAVID, aweb-based application (https://david. 
ncifcrf.gov). 


Purification of phosphomimetic APC/C-CDC20 with WDRS for 
negative-stain electron microscopy 

Recombinant APC/C-CDC20 containing glutamate mutations that 
mimic phosphorylation’ was purified as previously described. In 
brief, APC/C and CDC20 were expressed independently in High Five 
insect cells (Thermo Fisher Scientific) and colysed by mixing and soni- 
cation. Cleared lysate was treated to tandem Strep- and GST-affinity 
chromatography selections for APC2 and APCI6, respectively. GST 
elution fractions containing APC/C-CDC20 were combined with TEV 
protease, HRVI4 3C protease and purified MBP-Flag-WDRS-His con- 
taining a TEV proteolytic site N-terminal of the Flag tag. This mixture 
was further purified through Flag affinity chromatography and eluted 
with antigenic peptides. 


Negative-stain electron microscopy 

For negative-stain electron-microscopy studies, 125 ug of purified 
APC/C-CDC20-WDRS eluate from Flag immunoprecipitations was 
loaded onto a 10-40% glycerol gradient containing 50 mM HEPES pH 
8,0,200 mM NaCl and 2 mM MgCl,. For particle fixation by GraFix", 
the gradient also contained 0.025% and 0.1% glutaraldehyde in the 
lighter and denser glycerol solution, respectively, creatinganadditional 
glutaraldehyde gradient from top to bottom (0.025-0.1%). Centrifuga- 
tion was performed at 34,000 rpm ina TH-660 rotor (Thermo Fisher 
Scientific) for 15 hat 0 °C and the solution was subsequently fraction- 
ated. APC/C particles were allowed to adsorb ona thin film of carbon, 
transferred onto a plasma-cleaned lacey grid (LC200-CU, Electron 
Microscopy Services) and then stained for 2 min with a4% (w/v) uranyl 
formate solution. Micrographs were collected on a FEI Titan Halo at 
300 KV with a Falcon 2 direct detector (FEI) (MP1 of Biochemistry). 
The nominal magnification was 45,000x, resulting in an image pixel 
size of 2.37 A per pixel on the object scale and data were collected ina 
defocus range of 1.5-3.5 M. Particles were autopicked usingRelion*. 
The contrast transfer function parameters were determined using 
CTFFIND4*. UsingRelion, particles were extracted from micrographs 
and subjected to 20 classification. Inconsistent class averages were 
removed before 3D classification in Relion. 

Structural modelling was performed usingUCSF Chimera (1.13.1)”. 
Toidentify electron microscopy density corresponding to WDRS, the 
electron microscopy reconstruction of APC/C-CDC20-WDRS obtained 
from 3D classification in Relion was superimposed with a prior map 
from an APC/C-CDC20-substrate complex (EMDB-3385, ref. “) low- 
pass-filtered toa comparable resolution. Although the resolution pre- 
cludes definitive structural modelling, it allowed approximate, global 
placement of the crystal structures of WDRS*, along with the APC2 
winged-helix box and APC1IRING domains”, which are known to 
be mobile and toadopt distinct orientations when bound to different 
APC/C partner proteins. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

All original data are available from the corresponding author on 
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Custom Python scripts are available from the corresponding author 
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Extended Data Fig.1|Seenext page for caption. 


Extended Data Fig. 1|Ultracomplex shRNAscreen identifies APC/Cand 
USP44as regulators of human ES cell biology. a, Karyotype analysis of HI 
OCT4-GFP cell ine shows normal chromosome architecture. This line was 
karyotyped before performing the screen byathird party vendor (WiCell). 
Twenty cells were counted, 8 were analysed and 4 were karyotyped as normal. 
Noclonal abnormalities were detected at the band resolution of 450-475. 

b, HIOCT4-GFP cells undergo neural conversion with an efficiency similar 
tothat of the unmodified parent line. This experiment was performed three 
independent times with similar results.¢, Deep sequencing read counts (lo 
transformed) for individual shRNA (red dots) targeting the indicated gene 
fromthescreenin Fig. 1b. Grey dots represent negative-controlshRNAS. 
Pvalues (two-sided Mann-Whitney Utest, not corrected formultiple 


hypothesis testing) are indicated for each gene. d, Mass spectrometry analysis 
shows that many quality-control enzymes associate with K11/K48-branched 
chains inhuman ES cells, HI human ES cells were synchronized in mitosis before 
being subjected to affinity purification using K11/K48-bispecificantibodies 
under denaturing conditions. Values listed in brackets are total spectral counts 
of tryptic peptides for each protein. e, CRISPR-Cas9-edited USP44 HI human 
ES cells show impaired rates of neural conversion. Expression of markers of 
pluripotency (OCT4), neural crest ells (SNAIL2) or neural progenitors (PAX6) 
were determinedat indicated times of differentiationby sodium dodecyl 
sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and western blotting 
using specific antibodies. This experiment was performedtwo independent 
times with similar results. 
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Extended Data Fig. 2 | Characterization of APC/Cand therole of USP44 in 
pluripotency.a, Western blot of OCT4 and NANOG upon APC2 knockdown in 
asynchronous Hi humanES cells. This experiment was performed two 
independent times with similar results. b, Western blot of OCT4 and NANOG 
upon knockdown of APC/C subunits in asynchronous HL human ES cells. This 
experiment was performed threeindependent times with similar results. 

¢, Real-time qPCR of OCT#and NANOGupon APC2 knockdown in asynchronous 
Hihuman ES cells (meanofn=4 independent experiments, +s.d.).d, Flow 
cytometry analysis of APC2 depletion in HI OCT4-GFP human S cells. HI 
OCT4-GFP human ES cells were transfected with siRNA against APC2for 48h 


beforecytometry analysis. This experiment was performed two independent 
times with similar results. e, Loss of the pluripotency marker OCT4 upon 
depletion of APC2 or WDRS requires entry intomitosis, Hlhuman ES cells were 
transfected withindicated siRNAs for 36h and treated withDMSO 
{asynchronous),5 wMSTLC (mitotic arrest) or200 mM thymidine (S-phase 
arrest) for an additional 12 hbefore collection for western blotanalysis. This 
experiment was performed three independent times with similar results. 
£,Flow cytometry analysis of asynchronous HI human ES cells transfected with 
indicated siRNAs for 72h. This experimentwas performed three independent 
times with similar results, 
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Extended DataFig.3| See next page for caption, 
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Extended DataFig.3| APC/Cand WDRSare required for humanEScell 
survival. a, Mass spectrometry analysis of Flag-WDRS purified from mitotic 
HEK293T cells. Values listed in bracketsare total spectral counts of tryptic 
peptides of indicated proteins. b, Depletion of WDRS phenocopiesthe 
depletion of APC2in HI human ES cells. Hi cells were depleted withthe 
indicated siRNAs for 72hbefore collection for western blot analysis. This, 
experiment was performed once. ¢, Acumulative fraction curve measuring the 
length of each metaphase-to-anaphase transition.t=112.cells for control 

siRNA; /'=105 cells for siRNA against APC2; n=106 cells for siRNA against 
WDRS;and n=217 cells for siRNAs against both APC2and WDRS.d, Depletionof 
APC2 or WDRS causes cell death in HI humanES cells. Cell death wasmeasured 
by trypan blue staining of dead cells (meanofn=4 independent 

experiments +s.d.).e, Quantifyingcell survival usingchromosome 
catastrophe asa proxy for cell death. HI human ES cells virally expressing H2B- 
‘mCherry were transfected with the indicated siRNAS for 24 h beforeimaging by 
confocal microscopy. n=97 cells for control siRNA; =104 cells for siRNA 
against APC2; n=90 cells for siRNA against WDRS;and.n=213 cells forsiRNAS 
against both APC2 and WDRS.F, Sister cells dieimmediately following mitotic 


exitwhen depleted of APC2 and WDRS. HI human ES cells virally expressing 
H2B-mCherry were transfected with siRNA against APC2and/or siRNA against 
WDRSfor 24 hbeforeimagingby confocal microscopy. The time of death, as 
defined by cells undergoing chromosome catastrophe, was measured for each 
sister (mean of n=57 pairs of cells+s.d.).g, Representative framesof live-cell 
imaging from four independent experiments (in minutes) tracking the nuclei of 
siRNA-depleted HI human ES cells virally expressing H2B-mCherry. Arrows 
mark individual sister cells upon mitoticexit. Chromosome catastrophe was 
used a proxy for cell death (time points 198 and 342). Flag-WDRSassociates 
with APC/C in mitotic Hi human ES cells. Flag-WDRSimmunoprecipitations 
were performed onasynchronous HI human ES cells (A) or HL human ES cells 
arrested in mitosis (M). Bound proteins were determined by SDS-PAGE and 
westernblotting. This experiment was performedtwoindependent times with 
similar results.i, Overexpressed haemagglutinin (HA)-tagged USP44 
associates with Flag-WDRS in both asynchronous and mitoticHEK293T cells. 
MYC-WDRS was used as the control vector. This experiment was performed 
three independent times with similar results. 
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Extended Data Fig. 4 See next page for caption. 
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Extended Data Fig. 4| WDRS associates with APC/Cand TBP on distinct 
surfaces. a, The WIN-motif bindingsiteon WDRS is critical for APC/C 
engagement, whereas the surface that binds the WBMis dispensable.A 
secondary binding surface (4) isalso important for theassociation of WDRS 
with APC/C. Thesurface on WDRS that binds the WBMis critical for TFUD 
association, whereas the WIN-motif binding siteis dispensable. HEK293T cells 
were transfected with the indicated Flag-WDRS variants, and cells were 
synchronized inmitosis. Flag-WDRS was affinity-purified, and bound proteins 
were determined by western blotting, This experiment wasperformed five 
independent times with simitar results. b, Reciprocal immunoprecipitations 
show that APC/Cbinds WDRS through its WIN-motifbindingsite. Endogenous 
APC/C was purified from HEK293T cellsexpressing the indicated Flag-WDRS 
variants, and bound proteins were determined by SDS-PAGE and western 
blotting. This experiment was performed threeindependent times with similar 


results. c, Heat map of bait-normalized total spectral counts identified from 
Flag-WDRS-purified mass spectrometry experiments. HeLacellswere 
transfected with Flag-WDRS for 24 h before mitotic synchronization. d, The 
WDRS inhibitor MM-102 impairs the association of WDRSwith APC/C. Mitotic 
HeLa$3 cells were released into MM-102 for 2h before immunoprecipitation 
experiments. Under these conditions, MM-102 did not prevent the association 
of WDRS with MLL and RBBPS. This experiment was performed two 
independent times with similar results. e, Expression of wild-type WDRSbut 
not WDRS(AWIN) rescues the pluripotency defect caused by WDRS depletion 
in HI human ES cells. HL human ES cells virally expressing siRNA-resistant 
WDRS variants (WDRS versus WDRS(AWIN)) were depleted of endogenous 
WDRS (W) or treated with control siRNA (C). Expression of OCT4 and NANOG 
was determined by western blotting. This experiment was performed once. 
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western blotting using specific antibodies. This experiment was performed subunits. APC2 binding was validated three independenttimes. 
two independent times with similar results. b, In vitro translation binding ¢, Quantification of autoradiography blotshown inb. 


assays reveal that APC2 directly interacts with recombinant WDRS, 
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Extended Data Fig. 6| WDRS associates withactive APC/C.a, APC/C doesnot 
ubiquitylate WDRSin vitro. Recombinant WDRS was incubated with active 
APC/C, E1, UBE2C, UBE2S and ubiquitin, and potential reaction products were 
detected by western blotting against WDRS. This experiment was performed 
once. b, APC/C-dependent ubiquitylation of gemininis outcompeted by 
recombinant securin (comp), acanonical substrate, but not by recombinant 
WDRS. Securin or WDRS was added to APC/C-dependent geminin 
ubiquitylation reactions at the indicated concentrations, and various reaction 
productswere detected using western blotting. Asterisks represent cross- 
reactive bands. This experiment was performed once. ¢, APC/C-WDRS- 
dependent ubiquitylation of gemininisinhibited by EMI. WDRSaffinity 


purifications from mitoticHeLa cells were incubated with El, the 
APC/C-specific E2 enzymes UBE2C and UBE2S, andubiquitin. EMI wasadded 
at indicated concentrations, and reaction products were detectedby western 
blotting using antibodies against geminin. This experiment wasperformed 
two independent times with similar results. d, Immunoprecipitation of Flag- 
WDRS from mitotic HEK2931 cells coprecipitates KIlinked ubiquitin chains. 
HEK293T cells arrested in prometaphase were released into fresh medium, and 
WDRS was affinity-purified at the indicated time points. Bound proteins were 
detected by western blotting, This experiment was performed once. 

€, Depletion of UBE2S eliminates WDRS-associated KILinked ubiquitin chains 
in mitotic HEK293T cells. This experiment was performed once. 
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Extended DataFig.7| See next pagefor caption, 
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Extended Data Fig. 7| Mitotic APC/C-WDRS complexes are catalytically 
active. a, APC/C-WDRS ubiquitylates human H3 in H3-H4 tetramersin vitro. 
APC/C wasaffinity-purified from mitotic HeLa cells and incubated with El, 
UBE2C, UBE2S, ubiquitin and human H3-H4 tetramers, asindicated. Reaction 
productswere detected by western blottingusing antibodiesagainstH3and 
H4. This experiment was performed once.b, APC/C ubiquitylation of H2B in 
X.laevisH2A-H2B histone dimers. APC/C-WDRS was purified from mitotic 
HeLacells by Flag-WDRS affinity purification and incubated with E1, UBE2C, 
UBE2S, ubiquitin and X. aevishistone octamers. Ubiquitylationwasdetected 
by western blotting against ubiquitylated H2B. Thisexperimentwas 
performed three independent times with similar results. , APC/C-WDRS 
ubiquitylation of H2B in X. laevis H2A-H2B-H3-H4 histone octamers. 
Reactions were performedas described in. This experiment was performed 
‘two independent timeswith similar results. d, APC/C purified from Hi human 
ES cellsis competent to ubiquitylate human H2B. Thisexperimentwas 
performedtwo independent times with similar results. e, APC/C purified from 
mitotic, butnotS-phase, extracts can ubiquitylate H2B invitro. APC/C was 
purified from HeLacells synchronized at the indicated cell-cycle stagesand 
incubated withE1, UBE2C, UBE2S, ubiquitinandX. laevis H2A-H2B dimers. 


Histone ubiquitylation was detected by western blotting using antibodies 
against ubiquitylated H2B. This experiment was performed once.f, APC/C- 
dependent ubiquitylation of H2B requires Kl residue on ubiquitin for chain 
elongation. Ubiquitylation of H2A-H2B dimers by APC/C-WDRS was 
performedas described in e, but with ubiquitin variants. This experiment was 
performedonce. g, APC/C-dependent ubiquitylation of H2B requires both KIL 
andK48 on ubiquitin for synthesis of branched chains. This experiment was 
performed two independent times with similar results. h, Securin, acanonical 
APC/Csubstrate, outcompetes H2A-H2B dimers for APC/C-dependent 
ubiquitylation. TheD-box motif (an APC/C-CDC20-specific degron)is 
required for full competition, whereas the KEN motif an APC/C-CDHI-specific 
degron) isnot. This experiment was performed two independenttimes with 
similar results.i, Polyubiquitylated H2B is degraded by the proteasome. K11/ 
K48-branched chains were purified under denaturing conditions from mitotic 
HeLacells either inthe presence or absence of MG132, and modified H2B was 
detected using western blotting. Proteasome inhibition with MG132 was found 
to stabilize endogenous polyubiquitylated H2B. This experiment was 
performed four independent times with similar results. 
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Extended Data Fig.8 |See next page for caption. 


Extended Data Fig. 8 | ChIP-seq analyses of APC/C-and WDRS-occupied 
gene targets.a, Overall histone levels donot changeupon release from 
mitosis. Hi cells weresynchronized in mitosis by STLC and released into fresh 
medium. Indicated proteins were monitored by westernblotting. This 
experimentwas performed two independent times with similar results. 

b, Comparison between ChiP-seq and MNase ChiP-seq against anti-KIl from 
mitoticH1 human ES cells reveals that sonication shears polymeric ubiquitin 
linkages. ¢, Venn diagram of anti-K11and anti-WDRS ChIP peaks that colocalize 
withTSSs from MNase ChIP-seq experiments. MNase ChIP-seq experiments 
were performed from mitotic HI human ES cells. d, Heat map of MNase ChiP- 
seq datafrommitoticHl human€S cells. Cluster lincludessitesthat are co- 
occupied by K11 and WDRS near TSSs (within 100 bp); cluster 2includes sites 
that are co-occupied by KI1 and WDRS outside of TSSs;and cluster 3includes 
sites occupied only by KIL, regardless of colocalization with TSSs.e, ChIP- 
4PCRanalysis of candidate targets usingK IL or K11/K48+linkage specific 
ubiquitin antibodies from mitotic HI human ES cells. Mean ofindependent 


replicates #s.d.n=3forK11/K48; n= for IGGand Kil (except n=4 for PUML).. 
£, ChIP-qPCR analysis of mitotic HI human ES cells shows that KI linkages 
synthesized at candidate sites are dependent on UBE2Sand WDRS. This 
experiment was performed once. g, WDRS inhibition prevents KIL-ubiquitin 
chain formation at APC/C-WDRS-bound TSSs. HI human ES cells were treated 
‘with or without 50 iM MM102 during mitotic synchronization with STLC 
before anti-K11 MNase ChiP-seq. Heat map of all APC/C-WDRS-bound TSSsare 
shown. h, Heat map of ChiP-seq peaks of individual genes co-occupied by 
Flag-CDC20 and Flag-WDRS. ChiP-seq against anti-Flag was performed on 
mitotic HEK293T cells that overexpress Flag-CDC20 or Flag-WDRS. i, Spatial 
profile of PUMZ of factor occupancy by ChIP-qPCR. This experiment was 
performedonce.j, Spatial profile of £2F3 of factor occupancy by ChIP-qPCR. 
This experiment was performed once. k, Heat map of MNase ChIP-seq data of 
transcription-factor binding, Previously published MNase ChiP-seq datawere 
obtained”, and APC/C-bound sites were analysed as described ind. 
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APC/C-bound sites were analysed as follows: cluster includessitesthatare 
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Extended Data Fig. 10 |See next page for caption. 
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Extended Data Fig. 10 | Regulation of chromatinand transcription by 
APC/C-WDRS. a, Comparison of genes co-occupied by Flag-CDC20 and 
Flag-WDRS from mitotic HEK293T cells with knowngene-expression profiles 
revealsa strong overlap with ES cell and medulloblastoma cancer cell lines, 

‘n= 1,628geneswere analysed (Pvalues represent a one-sided Fisher’sexact test 
with Bonferroni correction). b, Loss of APC/C-WDRS function interferes with 
the expression of genes marked with KILinked ubiquitin chains in HI human 
EScells. Poly(A)-selected RNA was purified fromasynchronousHI human ES 
cells transfected with control siRNA or siRNA against WORSfor 48h, and 
subjected to RNA-sequencinganalysis (abiological replicate of Fig. 4h).. 

¢, Transcriptanalysis of WDRS depletion on APC/C-WDRS-dependent genes 
(from Fig. 4hand b).Box plots include the median TPM value (n=90 genes) with 
quartile ranges Q1-Q3;top whiskers represent the 3rd quartile +1. 
interquartile range; bottom whiskers represent the Ist quartile -1.5% 
interquartile range. Pvalues were calculated from comparing individual TPM 
values of APC/C-WDRS-regulated genes (n=90) versus all transcripts 
(n=18,791) usinga two-sided Student's ¢-test (unpaired).d, Real-time qPCR 
analysis of nascent RNA reveals APC/C-WDRS target genesare reactivated 
‘upon mitotic exitand gene reactivationis dependent onWDRS. Initial 
screening fromasingle experiment.e, The RNA levels of genes regulated by 
APC/C-WDRS do not change upon mitotic exit. RNA-sequencinganalysis was 
performedon poly(A)-selected RNA purified from HI human ES cells atthe 


indicated cell-cycle stages. Box plots were derived as described ine 
(n=90 genes). f, Ubiquitylated H2B preferentially associates with p97-UBXN7 
in vitro. H2B was preubiquitylated by APC/C in vitro, and incubated with 
immobilized p97 orp97-UBXN7 complexes. Bound histone H2B was detected 
by western blotting. This experiment wasperformed three independent times 
withsimilar results. g, Flag-UBXN7 associates with polyubiquitylated H2B, p97 
andK11/K48.linked branched ubiquitin chainsin mitosis. Native Flag-UBXN7 
immunoprecipitations were performed on mitotic HEK293T cellsand bound 
proteins were detected by western blotting or Ponceau staining. This 
experiment was performed three independent times with similar results. 

hh, H2B ubiquitylation stabilized by p97 inhibition in cells. Denaturing K11/ 
K48 immuno precipitations were performed on Hl humanES cells synchronized 
in prometaphase or released into 10 1M NMS-873 for 2h. Thisexperiment was 
performed four independent times with simitar results. i, p97 inhibition 
restores KII deposition at sites regulated by APC/C-WDRSupon mitotic exit. 
Anti-K11 MNase ChIP-seq was performed from Hl human S cells synchronized 
in mitosis (0h) or released into fresh medium without (2h + DMSO) or with p97 
inhibition (2h+ 10 »MNMS-873).j, Anti-KII MNase ChIP-qPCR of candidate 
targets from mitotic HI human ES cells (mean of n=3independent 

replicates +s.d,). HI human ES cells were synchronized in mitosis (Oh) and 
released into fresh medium for2h with the indicated drugs. k, Model of APC/C- 
dependent gene activation uponmitoticexit. 
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1:1000 for WB), Anti-T8®, rabbit (#44059, clone DSC9H, Cell Signaling, 1 ug per ChIP, 1:10,000 for WB, Lot 1), Anti-Tubulin, 
mouse (#CP06, clone DM1A, EMD Millipore, 1:500,000 for WB, Lot 2681308), Anti-UBCH10/UBE2C, rabbit (A-650, Boston 
Biochem, 1:2000 for WB, Lot 2600774), Ant-UBE2S, rabbit (20177508, Abcam, 1:10,000 for W8, Lot GR290784-7), Anti-Vineulin, 
rabbit (14650, Cell Signaling, 1:1000 for W8, Lot 4), Anti-WORS, mouse (sc-393080, clone G-9, SCBT, 10 ug per IP, Lot C2816}, 
Anti-WOR5, rabbit (A302-430A, Bethyl Laboratories, 1:4000 for WB), Anti-WORS, rabbit (#13105, Cell Signaling, 1:2000 for W8, 
Lot 1), Anti-WDRS, rabbit (C15410027, Diagenode, 1 ug per ChIP, Lot O01), Normal IgG, mouse (sc-2025, SCBT, 1-2 ug per ChIP, 
1-10 ug per IP, Lot 33), 


Validation ‘Antibodies validated by siRNA knockdown: Anti-APC2 (#12301), Anti-APC3 (#12530), Anti-CDC20 (#14866), Anti-UBCHIO/UBE2C 
(4-650), Anti-UBE2S (ab177508), Anti-WDRS (A302-430A), Anti-WORS (#13105), Anti-WORS (CL5410027). Antibodies validated 
on recombinant proteins: Anti-DYKDODDK (#2368), Anti-DYDDDDK (#14793), Anti-FLAG M2 (F1804), Ant-HA (#3724), Anti 
Geminin (se-13015), Anti-H2A (#12349), Anti-H28 (#2934), Anti-ubiquity|-H2B(Lys120) (H5546}, Anti-H3 (ab1791), Ant-H4 
(ab10158), Anti-Securin (sc-22772). Antibodies validated by MS enrichment anzlysis: Anti-APC3/CDC27 (sc-9972), Anti-K11 (lab 
stock), Anti-K11/K48 (lab stock), Anti-WORS (c-393080). Antibodies validated by the lab previously: Anti-Actin (W69100), Anti- 
‘APC1 (#13329), Anti-APCA (sc-20985), Anti-APCS (sc-20986), Anti-APC6/CDC16 (sc-5615), Anti-APCT (sc-365649), Anti-APC10 
(sc-20988), Anti-APC11 (#14090), Anti-CDH (C7855), Anti-Cyclin A (sc-596), Anti-Cyclin Bi (#4138), Anti-GAPDH (#5174), Anti- 
HSP90-B (#741), Anti-Nanog (#3580), Anti-OCT3/A (sc-8628), Anti-OCT3/4 (#2750), Anti-PAX6 (#901301), Anti-PAX6 
(48528427), Anti-SNAIL2 (#9585), Anti-Tubulin (#CPO6), Ant-Vinculin (#4650). Antibodies validated from manufacturer's site 
‘Anti-ASH2L (A300-489A, verified by heterologously transfecting ASH2L), Anti-£2F3 (GTX102302, verified by knockdown, Anti- 
MLLi (#14197, verified by knockout), Anti-PUM1 (#22322, verified by Sur et al, 2028, Kulkarni et al, 2028}, Anti-RBBPS (113171, 
verified by Bogershausen et al, 2015, Alvarado et al, 2017, Ishiushi etal, 2019), Anti-TAFI (¥12781, verified by Le Gallo et al, 
2017), Anti-TAF7 (#13506-1-AP, uncharacterized], Anti-TBP (#44059, verified by Dal-Pra et al, 2017, Qin et al, 2018, Zhang et al, 
2019). 


Eukaryotic cell lines 


Policy information about cell lines 


Cell ine source(s) Hi (WA01), HEK293T cells (Berkeley Cell Culture Facity), HeLa (lab stock), and Hela S3 (jab stock) 


Authentication H1 hESCs were purchased directly from WiCell (OCTA/NANOG positive, karyotype analysis showed no chromosomal 
anomalies). HEK293T were purchased directly from the Berkeley Cell Culture Facility (authenticated by short tandem repeat 
analysis). Hela lines were nat authenticated, 


Mycoplasma contamination Al cell lines were routinely tested for mycoplasma contamination using the LONZA kit (MYCOALERT), All cel lines tested 
negative for mycopiasma, 


Commonly misidentified lines No commonly misidentified lines were used for this study. 
(Gee ICLAC register) 
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18-38 million reads per sample (oaired-end sequencing) for ChiP performed in 2937s, 12-90 milion reads per sample 
read} for ChIP performed in His. 


ingle 
Flag M2 (Sigma, 1804), histone 43 (Abcam, 1793), K11 linked ubiquitin chains (lab reagent), TBP (Cell Signaling, 44059}, 
WORS (Diagenode, ©15410027) 

macs14-t P_sorted.bam -c Input_sorted.bam -n macs_ filename -g 2.7e9-S -w 

Quality metrics for ChiP of 2937 cells given by macs14 (1.4.2): 9994 (13835/14000) Flag-CDC20 peaks at FDR 5% and above 5 
fold enrichment to input; 99% (4049/4090) Flag-WDRS peaks at FDR 5% and above 5-fold enrichment to input. Quality 


metrics for ChiP of H1s performed for K11 and WORS using macs14 (1.4.2): Peaks were chosen with a 5% FDR cutoff, above 
100 12-fold enrichment compared to input, and ¢ -10*LOG10(pvalue) value greater than 100-120 
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Single cell H1 Oct4-GFP suspensions were treated with accutase and immediately analyzed. 
LSR Fortessa 

BD Facsdiva (6.2), FlowCytometryTools (0.5.0) 

Measured Oct4 levels of all cells that passed FSC/SSC gates. Measured DNA content of cells that passed FSC/SSC gates, 


FSC/SSC gates to exclude cell debris, 


] Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Selective loading and processing of 
prespacers for precise CRISPR adaptation 
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CRISPR-Cas immunity protects prokaryotes against invading genetic elements’. 
Ituses the highly conserved Cas1-Cas2 complex to establish inheritable memory 
(spacers)? *. How CasI-Cas2 acquires spacers from foreign DNA fragments 
(prespacers) and integrates them into the CRISPR locusin the correct orientation is 
unclear*”. Here, using the high spatiotemporal resolution of single-molecule 
fluorescence, we show that Cas1-Cas2 selects precursors of prespacers from DNA in 
various forms—including single-stranded DNA and partial duplexes—ina manner 
that depends on both the length of the DNA strand and the presence of a protospacer 
adjacent motif (PAM) sequence. We alsoidentify DnaQ exonucleases as enzymes that 
process the Casl-Cas2-loaded prespacer precursors into mature prespacers ofa 
suitable size for integration. Casl-Cas2 protects the PAM sequence from maturation, 
which results in the production of asymmetrically trimmed prespacers and the 
subsequent integration of spacers in the correct orientation. Our results demonstrate 
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® Check for updates 


the kinetic coordination of prespacer precursor selection and PAM trimming, 
providing insight into the mechanisms that underlie the integration of functional 
spacersin the CRISPR loci. 


CRISPR (clustered regularly interspaced short palindromic repeats)and 
Cas (CRISPR-associated) proteins constitute an RNA-guided adaptive 
immune system that defends prokaryotes against invading nucleic 
acids". The first step of immunity is adaptation, or spacer acquisition, 
inwhich the geneticmemory is updated from foreign DNA fragments 
(prespacers) through the integration of spacers between the repeats of 
the CRISPR loci*”. Adaptation relies on the highly conserved Cas and 
Cas2 proteins, which form aheterohexameric Casly-Cas2, integrase 
complex (Casl-Cas2)*", In vivo studies have suggested that Casl-Cas2 
identifies suitable prespacers on the basis of aPAM sequence, whichis 
another prerequisite for CRISPR interference in immunity". 
Profiling of spacers showed that CasI-Cas2 derives new spacers from 
degradation intermediates that are produced by RecBCD during the 
repair of double-stranded DNA breaks" and by Cas3 during primed 
adaptation mediated by the Cascade-Cas3 complex"™®, Notably, both 
RecBCD and Cas3 generate single-stranded DNA (ssDNA) degradation 
products that contrast with the optimal substrate for integration by 
Casl-Cas2"*", For example, the integration-competent prespacer 
DNA for Escherichia coli type I-E Casl-Cas2 is composed of a central 
23-base-pair (bp) duplex with two S-nucleotide (nt) single-stranded 
3-overhangs (here termed the canonical prespacer)*>*. The ssDNA 
fragments thatare generated by RecBCDand Cas3mustbere-annealed 
and processed into the canonical form before integration into the 
CRISPR locus. However, whether Casl~Cas2 has an active role in cap- 
turing ssDNA fragments from these sources and making an effective 
Casl-Cas2-prespacer integration complex remains unknown. 


Spacers must ultimately be integrated in the correct orientation 
with respecttothe position of the PAMsequencein the 3/-overhang”*, 
Casl~Cas2 has been shown to integrate spacers in either orientation 
with an equal probability in vitro” “. However, only correctly ori- 
ented spacers result in functional CRISPR RNAs for targetrecognition 
in vivo", Recentin vivo andin vitro studiesintypel-A,-B, -Cand-D 
CRISPR-Cas systems have shown that Cas4 hasa critical role in the 
maturation of prespacer DNA and high-fidelity spacerintegration®”. 
However, how systems that lack Cas4~such as E. colitype I-E-coordi- 
nate the correct orientation of new spacersis unknown. 


Prespacer precursor selection by Casl-Cas2 

Cas1-Cas2 repurposes DNA fragments from invading genetic ele- 
ments", Most of these DNA fragments will have structures that devi- 
ate from the canonical form of prespacer DNAs‘. Casl-Cas2 should 
therefore be able to bind non-canonical prespacer DNAs (precursor 
prespacer DNAs). To visualize the process of precursor prespacer 
loading at a high spatiotemporal resolution, we developed a single- 
molecule Férster resonance energy transfer (smFRET) assay (Fig. 1a, 
Extended Data Fig. 1a). In brief, biotinylated Casl-~Cas2 complexes 
were immobilized ona microscope slide through biotin-streptavidin 
linkage and presented with prespacer DNAs labelled witha donor (Cy3) 
andanacceptor (Cy5) dye onthe top and bottomstrands, respectively 
(Fig.1b). These labelling positions yielded a FRET value of around 0.72 
(Extended Data Fig.1b) and enabled us to examine the bindingeventsin 
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Fig.1|Single-molecule analysis of prespacer precursor selectionby 
Casi-Cas2.a, Single-molecule assay to probe Casl-Cas2binding to canonical 
and precursor prespacer (PS) DNA. PEG, polyethylene glycol. b, Timetraces of 
donor (Cy3, green) and acceptor (Cy5,red) fluorescence signals and FRET 
efficiency (blue) exhibiting shortlived binding, where Aristhe dwell time of 
prespacer DNA-binding events to Casl-Cas2. Canonicaland precursor 
prespacer constructs consist of a23-bp central duplex, atop strand Cy3- 
labelledattheS-endand abottom strand Cys-labelled at the 16th nucleotide 
fromthe5end. DNA wasaddedat¢=5s. ¢, Time trace of long/lived binding, 
DNAwasaddedat ¢=5s. d, Binding frequencies (k,,) calculated from the 
cumulative probability of the arrival time.e, Dissociation rates (kx) of 


real timeusing total internal reflection fluorescence (TIRF) microscopy 
(Fig. 1a, Extended Data Fig. 1a). 

Weassessed the properties of CasI-Cas2 binding to acanonical pre- 
spacer DNA (PS-5-nt) (Fig. 1b) with an optimal form for spacer integra- 
tion into the CRISPR loci**, Notably, the majority of binding events 
(more than 99%) showed transient interactions (Fig. 1b, Extended Data 
Fig. 1c); by contrast, long stable-binding events were rarely observed 
(less than 1%) (Fig. 1c). Experimentsin the absence of Casl-Cas2 showed 
negligible non-specific binding (Extended Data Fig. 1d). Control experi- 
ments inwhich the position of the dye on the precursor prespacer DNA 
was varied did notshow any substantial differencesin bindingkinetics 
(Extended Data Fig. 1e-i). 

Toexamine the binding of Casl-Cas2 to precursor prespacer DNAs, 
we tested aseries of substrates with 3’-overhangs of more than Sntin 
length (7,10 and15 nt) (Fig. 1b). Analysis of these binding events showed 
that the binding frequency (k,,) increased with longer 3’-overhangs 
(Fig. 1d, Extended Data Fig. 1j), whereas the average dwell time (r,;) 
(equal to 1/k,., in which kis the dissociation frequency) remained 
unchanged (Fig. le, Extended Data Fig. 1c). After further extension 
of the 3’-overhang (20, 30, 40, 70 and 90 nt; Extended Data Fig. 1k), 
,q Saturated at an overhang length of around 40 nt (Extended Data 
Fig. 11), whereas, remained comparable among the different overhang, 
lengths (Extended DataFig. Im). This suggests that the lengthatwhich 
CasI-Cas2 makes effective interactions with prespacer precursors 
is limited by a distance of around 40 nt from the duplex region. Asa 
resultof the frequent interactions with precursor prespacer DNAs, the 
number of stably bound molecules increased proportionally to the 
length of the 3-overhang (Fig. 1f), while the survival rate of the stably 
bound molecules remained constant (Fig. 1g, Extended Data Fig. In). 
Furthermore, CasI~-Cas2 selected precursor prespacer DNAs with long 
3/-overhangs effectively ina competitive environment (Extended Data 
Fig. 1o-q). These results are consistent with the binding behaviour 
observed in electromobility shiftassays (EMSAs) (Extended Data Fig. Ir) 
and previous studies*. 

Next, weused a single-molecule assay to investigate whether Casl- 
Cas2recognizesthe PAM sequence during the short-lived interactions. 
First, we explored how the position of the PAMsite in the 3'-overhang 
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shortlived events calculated from the dwell time of binding eventsby single- 
exponential fitting. £, Relative number of moleculesat30 minafter addition of 
canonical or precursor prespacers (n=3). Representative CCDimages 
(acceptor channel) areincluded as nsets. Scale bars, 5 um. g, Survival time of 
stably bound canonical or precursor prespacersonCasl-Cas2. Free DNA Was 
washed awayaftera30-min incubation (n=3)..h, ky,and ky of short-lived 
events for various sequencesat the PAM position.i,£,,and k,, of PAM variants 
with Casl(Q287A/1291G)-Cas2. Data are mean +s.e.m. from 3independent 
replicates (f,g),or mean +95% confidence interval (Cl), obtained by bootstrap 
analysis of asingle replicate with n> 500 individual molecules (d,e,h, i), 

All dataare representative of three replicates with similar results. 


affects the binding behaviour of Casl-Cas2. When the PAM sequence 
in the3‘-overhang was moved away from the optimal position, fewer 
molecules were stably bound (Extended Data Fig. 2a, b), suggesting that 
Casl-Cas2 probed the 3'-overhang for PAM recognition. This findingis 
consistent with previous studies thatshowed that Q287 and 1291in the 
proline-rich C-terminal tail of Caslb recognize Cand Tin positions +5 
and +6, respectively, of the 5’-C,;T,,T.,-3’ region of the PAM sequence 
(S-CTT-3’ PAM) in the 3’-overhang’ (Extended Data Fig. 2c). 

To further elucidate the PAM specificity of Casl-Cas2, we gener- 
ated 16 different precursor prespacer DNAs that encompassall of the 
nucleotide combinations at the first and second positions (positions 
+Sand_+6) of the optimally located PAM. CasI-Cas2interacted with the 
different PAM variants at distinct frequencies (Fig. 1h, Extended Data 
Fig, 2d), while the dissociation rate remained unaltered (Fig.1h). Among 
these substrates, Casl-Cas2 showed the highest binding frequency 
for S’-CTT-3’ PAM, suggesting that Cas1-Cas2 preferentially binds pre- 
cursor prespacer DNAs that contain a PAM sequence. The difference 
in binding frequency among substrates was no longer observed ina 
C-terminal tail mutant (Q287A/1291G) of Casl-Cas2 (Fig. 1i).Inaddition, 
CasI-Cas2 selected PAM-containing prespacer precursors through 
frequent interactions in a competitive environment (Extended Data 
Fig. 2e-g). Thisis consistent within vivo data showing that spacersare 
preferentially acquired from PAM-flankingsites””°. 


Cas1-Cas? facilitates ssDNA pairing 

Studies have shown that Casl~Cas2 repurposes DNA degradation prod- 
ucts from RecBCD and Cas3*"". These fragmentsare likely to besingle- 
stranded when being released from the enzymes". To test whether 
Casl-Cas2 captures two complementary ssDNAs to form an effector 
integration complex, we repeated the single-molecule assay with a 
series of ssDNA fragments. Although binding events were dictated 
by the presence of the PAM sequence (Extended Data Fig. 3a), bind- 
ing frequencies (Fig. 2a, Extended Data Fig. 3b) and dissociation rates 
(Fig. 2b, Extended Data Fig. 3c) remained unaltered with an increasing 
number of PAMsitesin the ssDNA fragments. By contrast, the binding 
frequency, butnot the dissociation rate, dependedon the length of the 


: : “ee Sy 
on “ al 7 eer 
5 om =f] sien pe 
© as Fa wt oS Base 
No etPhla NaF contend opens <9 | ME! ao” He 


Tne 3) 


Fig.2|PAM-dependent ssDNA capture and facilitated pairing by 
(Cast-Cas2.a, b, koe (a) and kaa (b) values of ssDNA with various numbers of 
PAM sequences binding to surface-immobilized Casl-Cas2.¢, Single-molecule 
assay for Casl-Cas2 facilitated strand pairing. A biotinylated strand was 
surface-immobilized. Thenon-PAM (5-TTT3’) strand was labelled with Cy3 and 
the PAM (SCTT33’) strand was labelled with Cys. Green (532nm) and red 
(640m) laserswere usedto excite Cy3.and Cys, respectively. d, Number of 
binding eventsover time. Solid lines representa single-exponential fit. The k,, 
values (0.031 +0.0035*nM “for CTTand 0.027+0.005s'nM"'forTTT) were 
calculated from the cumulative probability. The data were normalized from 
0to1by using thehighestand lowest number from the CTT prespacer 
precursor, Data are mean +95% CI, obtained by bootstrap analysis ofa single 
replicate with n> 500 individual molecules (a,b).Alldataare representative of 
three replicateswith similar results. 


flanking sequence (Extended Data Fig. 3d, e). These findings suggest 
that Casl-Cas2 might use a facilitated diffusion mechanism to locate 
PAM sequences on ssDNA (Extended Data Fig. 3f). 

Given that prespacer DNAs need to be duplexed for spacer integra- 
tion, we investigated whether Cas1-Cas? facilitates the pairingbetween 
the PAM-containingtop strand andits complementary bottom strand 
by designing a single-molecule DNA-capture assay (Fig. 2c). After 
immobilization of PAM-deficient bottom strands on the surface, we 
introduced two complementary top strandsat InM:aCy5-labelled PAM- 
containing strand and a Cy3-labelled PAM-deficient strand (Fig. 2c). 
In the presence of Casl-Cas2, the top strands accumulated rapidly; 
conversely, no detectable accumulation of binding was detected in 
the absence of either Casl-Cas2 or the biotinylated bottom strand. 
Moreover, the PAM-containing top strand exhibited a higher binding 
affinity than the PAM-deficient version, resulting in a higher number 
of PAM-containing than PAM-deficient strands becoming annealed 
(Fig. 2d). These resultsare consistent with data from EMSA experiments 
(Extended Data Fig. 3g). Therefore, CasI-Cas2 facilitates the pairing 
of complementary ssDNA fragments into precursor prespacer DNAS, 
with the PAM sequence acting as an identification marker. 


Precursor prespacer trimming by DnaQ enzymes 

Once the PAM-containing prespacer precursors with long 3/-overhangs 
are selected by CasI~Cas2, the overhangs need to be processedintothe 
canonical size of 5 nt for efficient integration into the CRISPR locus* 
(Extended Data Fig. 4). Although a previous study suggested that 
Casl-Cas2 itself could process the 3/-overhang into the optimal 5-nt 
size using the potential endonucleolyticactivity of Casl subunits’, we 
werenot ableto reproduce this activity (Fig. 3a, Extended Data Fig. 5a). 
This suggests that there may be analternativemechanismtoproduce 
canonical prespacer DNAs. 

In the Streptococcus thermophilus type I-E CRISPR-Cas system, Cas2 
is fused with a DnaQ-like domain that exhibits prespacer maturation 
activity”. Thus, we hypothesized that 3/-5’ exonucleases with DnaQ- 
like domains might naturally act as 3’-overhang-trimming enzymes 
(‘trimmers’) in E. coli. To test whether any 3/~5’ exonucleases with or 
without DnaQ-like domains could trim prespacer precursors into the 
mature versions, we tested the following enzymes: DNA polymerase | 
(DNA Poll), which contains an exonuclease Il (Exoll) motif; the core 
complex (a, and @ subunits) of DNA polymerase Ill holoenzyme 
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Fig.3 | Delayed PAM trimming results inintegration of spacersin the correct 
orientation. a, In vitro trimming assay using wild-type Casl-Cas2and 3 
exonuclease candidates. DnaQ or DnaQ-like enzymes are marked with an 
asterisk. b, Invitro trimming assay using wild-type (WT) Casl-Cas2 and mutant 
Casi(Q287A/1291G)-Cas2 with DNA Polllland ExoT. Precursor prespacer DNA 
substratesare shown on top, and untrimmed (U), partially trimmed (P), 
trimmed (T) and degraded (D) productsare indicated (a,b). Sampleswere 
collected after 30-min incubation with exonucleases.c, Invitro trimming: 
driven integration assay using wild-typeand mutant Casl(Q287A/1291G)-Cas2 
and DNA Pollll core. Dye-labelled canonical prespacer DNA (5-nt-C/5-nt-C) and 
precursor prespacer DNAs without PAM (7-ntCGT/7-nt-CGT) or with PAM 
(7-neCGT/7-neCTT) were used. The linear CRISPR DNA substrate was modified 
with three consecutive PTO linkagesat both3”-ends for protectionagainst non- 
specific degradation by DNA Pollll. The products of integration at the leader 
side and the spacer side are 78-nt and 113-nt long, respectively. The contrast of 
areas of spacer-side and leader-side integration products wasadjusted for 
optimal visibility. For gel source data, see Supplementary Fig. 1.All dataare 
representative of three replicateswith similar results. 


(DNA Pollll core); the DNA Pollll holoenzyme (DNA Pollll HE} 
ase I (Exol); RecBCD (ExoV); exonuclease VII (ExoVIl); and exonuclease 
T (ExoT)®. We designed an in vitro trimming assay that enabled us to 
distinguish the efficiencies of the 3’-5’ exonuclease enzymes on both 
PAM-containing and PAM-deficient strands (Fig. 3a). 

The trimming assay showed that DNA Poll and Exol exhibited 
aweak level of trimming (Fig. 3a): the majority of products were 
partially trimmed (toa size of around 32 nt) and could not be used as 
substrates for adaptation. Notably, DNA Pollll (core or holoenzyme) 
and ExoT trimmed the3’-overhang of the PAM-deficientbottom strand 
tothe canonical size of 28 nt (5-nt3'-overhang) (Fig. 3a, Extended Data 
Fig. 5). By contrast, most PAM-containing top strands were partially 
trimmedtoa non-canonical size (around 31 nt) (Fig. 3a, Extended Data 
Fig. Sa). Anassay usinga prespacer precursor withouta PAM sequence 
resulted in both strands being trimmed to the canonical size (28 nt) 
(Extended Data Fig. Sb), whereas a prespacer precursor thatcontained 
a PAM sequence in both strands showed only partially trimmed strands 
{around 31 nt) (Extended Data Fig. Sb). Theseresults demonstrate that 
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Fig. 4|Prespacer selection, trimmingand integration nnaive and primed 
adaptation. (1) RecBCD and Cascade-Cas3 complexes generate ssDNA 
degradation fragments thatcan be used for naive and primed adaptation, 
respectively. (2) PAM-containing (S-CTT-3') ssDNA strandsare captured by 
CasI-Cas2, (3) Complementary strands are annealed by CasI-Cas2, Prespacer 
precursors loaded into Casl-Cas2are likely to have 3-overhangs longer than 
Snt. (4) APAM-deficient strand is converted to the canonical size (Sint) by DNA 
Pollll or ExoT, whereasaPAM-containingstrandis partially trimmed (toasize of 
about 8nd) owing to the interaction between PAMand the C-terminaltailof 
Casib. (5) The mature non-PAM-derived 3-end isintegrated at theleader side of 
the first repeat. (6) The partially trimmed PAM-derived3-overhang of the half- 
site intermediateis released and further trimmedinto the canonical size. 

(7) The mature PAM derived end isintegrated at the spacer side. (8) Casl-Cas2 
disengages from the CRISPR locus by an unknown mechanism. (9) DNA repair 
enzymes fillthe gaps, duplicating repeats. Unlike a previous model (Extended 
Data Fig. 9), this ‘delayed PAM trimming’ modelexplains the bias for correct 
orientation of anewly integrated spacerin the CRISPR locus. 


prespacer precursors witha PAMsite in only one of thetwo3-overhangs 
areasymmetrically trimmed by DNA Pollll and ExoT trimmers. 


Biased integration by asymmetric trimming 

Our data suggest that the C-terminal tail of Cas1 is essential for rec- 
ognizing the PAM sequence (Fig. li, Extended Data Fig. 2c). We thus 
hypothesized that the PAM site in the 3’-overhang of prespacer pre- 
cursors is protected from nuclease attacks by the interaction withthe 
C-terminal tail of Cas1. In vitro trimming assays with the PAM-binding 
mutant CasI(Q287A/1291G)-Cas2 showed that the C-terminal tail of 
Caslb protects the PAM from being processed by trimmersand thereby 
results in prespacer DNAs with asymmetrically trimmed 3'-overhangs 
(Fig. 3b, Extended Data Fig. Sc, d). 

From theasymmetry in prespacer maturation, we hypothesized that 
the canonically trimmed PAM-deficient strand (28 nt) is integrated 
first at the leader-side integration site (L-site), producing a half-site 
intermediate; this is followed by the integration of the non-canonically 
trimmed PAM-containing strand (31 nt) at the spacer-side integration 
site (S-site) of the CRISPR DNA. To test whether the asymmetry in the 
prespacer precursors generates a bias for correctly oriented spacers, 
we repeated the in vitro integration assay with asymmetrically trimmed 
prespacer DNAs (Extended Data Fig. 4). Whereas the symmetrically 
trimmed canonical prespacer DNAs were integrated at both the S-site 
and the L-site without showing any bias, asymmetrically trimmed 
prespacer DNAs showed a bias towards the integration of the 3’-end 
of PAM-deficient strandsat the L-site (Extended Data Fig. 4). Thisfinding 
supports amodel of stepwise integration into the CRISPR DNA”, which 
we validated using asmFRET assay (Extended Data Fig. 6). 
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Spacer orientation by delayed trimming 


‘To complete the integration of spacers, the PAM-containing end of a 
prespacer precursor must be trimmed to the canonical size of 28 nt 
(S-nt3/-overhang) for integration at the S-site of the CRISPR locus. To 
determine whether DNA Pollll core and ExoT can processPAM-contain- 
ing partially trimmed 3/-overhangs, we performed anin vitro trimming- 
driven integration assay with half-site intermediates (Extended Data 
Fig, 7a). The PAM-containing top prespacer strands with various lengths 
of ¥-overhang were processed to the canonical size and integrated, 
resultingin S-site integration products (Extended Data Fig. 7b-d). 

To investigate whether Casl-Cas2 and the integration host fac- 
tor (IHF) dimer, together with prespacer trimmers, are sufficient for 
spacers to be integrated in the correct orientation, we performed an 
in vitro trimming-driven integration assay with a phosphorothioate 
(PTO)-modified linear CRISPR DNA molecule and various prespacer 
precursors (Fig. 3c, Extended Data Fig. 8a). Whereas the symmetrically 
trimmed prespacer DNA (5-nt-C/5-nt-C) resulted inacomparableratio 
of correctly and incorrectly oriented full-site integration products, 
the asymmetrically trimmed (5-nt-C/15-nt-CTT), partially trimmed 
(7-ntCGT/7-nt-CTT) and untrimmed (15-nt-CGT/15-nt-CTT) prespacer 
precursors resulted in full-site integration products witha bias for the 
PAM-containing end towards the S-site of the CRISPR DNA (Fig. 3c, 
Extended Data Fig. 8b). We further confirmed the trimming-driven 
integration usinga smFRET assay (Extended Data Fig. 8c, d). To clarify 
theroleof PAM protection by the Casi C-terminal tailin this orientation 
bias, we repeated the in vitro trimming-driven integration assay using 
the PAM-binding mutant Casl(Q287A/1291G)-Cas2, which exhibited 
nobiasin theintegration orientation (Fig. 3c). Therefore, delayed PAM 
trimming of precursor prespacer DNAs results in a strong bias for the 
integration of correctly oriented spacers in the CRISPR locus, which 
confers robustness to CRISPR-Cas immunity (Fig. 4). 


Discussion 
Until now, the working model for the acquisition of spacers has 
described prespacer binding, maturation and integration asindepend- 
ent steps (Extended Data Fig. 9). This model is incomplete, however, 
as it necessitates symmetrically trimmed canonical prespacer DNA 
for integration—which cannotexplain how the cytosine residue atthe 
3/-end of the PAM sequenceis correctly oriented for spacer integration. 
We have shown here that the processes of prespacer binding, matura- 
tion and integration are tightly coordinated in time (Fig.4).Casl-Cas2 
makes frequent interactions with suitable prespacer precursors among 
a wide variety of substrates, including ssDNA and partially duplexed 
DNA withlong3-overhangs. Once loaded, DNA Polllland other DnaQ- 
like exonucleases process prespacer precursors into the mature ver- 
sions, while the PAM is protected by the C-terminal tail of Casl. This 
results in an asymmetric maturation of the prespacer precursors and 
abias for L-site integration of the non-PAM end of the prespacer. Next, 
the PAMend of the half-site-intermediate prespacer is released from 
Casl, processed to the mature prespacer and subsequently integrated 
into the S-site, resultingin correctly oriented and functional spacersin 
the CRISPR loci. This asymmetry in trimming and subsequent biased 
integration may be equally importantin dictating correct positioningin 
other types of CRISPR-Cas systems (type ll, for example), but different 
proteins couldbe involved in PAM recognition and precursor trimming. 
Our single-molecule assay with ssDNA substrates revealed that Casl- 
Cas2 can capture PAM-containing ssDNA fragments and facilitate the 
formation of prespacer precursors by recruiting the complementary 
strand. On the basis of these findings, we propose a model in which 
Casl-Cas2 captures ssDNA fragments in the cell toacquire functional 
prespacer substrates. In this model, Casl-Cas2 transiently interacts 
with ssDNA througha facilitated diffusion mechanism to search for 
a PAM sequence. Casl-Cas2 has been shown to form a complex with 


Cascade-Cas3 during primed spacer acquisition. The formation of 
this complex might allow ssDNA fragments tobe directly transferred to 
Casl-Cas2, ensuring that primed spacer acquisition is robust. Single- 
molecule studies have shown that Cas3 generates ssDNA loops dur- 
ing CRISPR interference” Itwill be interesting to ascertain whether 
ssDNA loops that are marked with PAM sequences can be directly rec- 
ognized by the Casl-Cas2 complex (Extended Data Fig. 10). In addi- 
tion, another study suggested that Chisitesare hotspots for the naive 
acquisition of new spacer candidates in vivo”. In the areas around Chi 
sites, proximal RecBCD forms long ssDNA loops, which might also 
actas initial docking regions for Casl~Cas2 (Extended Data Fig. 10). 

The fact that DNA Pollll has DNA polymerization activity suggests 
that there might be amolecular link between the trimming of prespacer 
precursors and the duplication of repeats, through the trimming-driven 
integration of prespacers. However, the intracellular copy numbers 
of the DNA Pollll holoenzyme are low (10-20 copies per cell) and its 
expression levelis tightly regulated. Thisnecessitates the involvement 
of abundant 3’-5’ exonucleases for efficient processing of prespacer 
precursors, and henceallowsthe CRISPR-Casimmunesystemto keep 
pace with rapid infections. The contribution of ExoT may compen- 
sate for the limited availability of DNA Pollll during the maturation of 
prespacer precursors, and itis likely that several other unidentified 
exonucleases arealsoinvolved. 

Several groups have harnessed the nucleic-acid-acquisition abilities 
of Casl-Cas2 to develop new techniques for recording nucleic acids 
in cellular contexts”. Cas1-Cas2-based recording techniques allow 
cellular eventsin prokaryotes tobe captured in chronological order”. 
Our results may help in developing a next generation of Cas1-Cas2 
recorders that are moreefficient at capturing information. Moreover, 
our findings may also enable a Casl-Cas2-based recording system to 
be developed in eukaryotes, which has not previously been reported. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Protein preparation 
Casl-Cas2 complex was expressed in E. coliBL21-Al chemically com- 
petent cells (Thermo Fisher Scientific, C607003) using pETS2b cloned 
with the Casl-Cas2 operon with Strep-tag Il (N-terminal) (pWUR871) 
and purified as described”. In brief, cells were grown to an optical den- 
sity at6001nm (OD,oo) of 0.4in Luria-Bertani (LB) broth medium, cooled 
onicefor30 minand induced with 0.5 mM IPTG (Gold Biotechnology, 
12481C50) and 0.2% L-arabinose (Gold Biotechnology, A-300-1). Pro- 
tein expression was induced overnight at 20 °C. Cells were collected 
by centrifugation and lysed in 20 mM HEPES-NaOH pH 7.5 (HEPES, 
Sigma, 3375), 75 mM NaCl (Sigma, $9888), 1mM DTT (Gold Biotech- 
nology, DTT100), 5% glycerol (Sigma, G7893-1L) and 0.1% Triton X-100 
(Sigma, X100-S00ML) usinga Stansted pressure-cell homogenizer. The 
lysate was cleared by centrifugation and incubated with strep-tactin 
sepharose (Iba Lifesciences, 2-1201-025) for lhat4 °C. Next, the lysate 
with beads was loaded onto agravity column and washed with 20 mM 
HEPES-NaOH pH7.5,300 mM NaCl,1mMDTT ands%glycerol, followed 
by elution with a buffer that contained 20 mM HEPES-NaOH pH 7.5, 
75 mM NaCl, 1mM DTT and 5% glycerol (Sigma-Aldrich, G7893-1L) 
(storage buffer) with 4 mM D-desthiobiotin (Iba Lifesciences, 2-1000- 
005). Thepresenceand purity of the Casl-Cas2 complex was checked 
through Bis-Tris 4~12% NuPAGE (Thermo Fisher Scientific, NPO32A) with 
NuPAGE MES SDS running buffer (Thermo Fisher Scientific, NPOOO2).. 
Theprotein complex was concentrated with Amicon ultra centrifugal 
filters (Merck Millipore) and further purified on a Superdex 200 10/ 
300GL size-exclusion column (GE Healthcare, 17517501) using the AKTA 
pure protein purificationsystem (GE Healthcare). The final complex was 
diluted in storage buffer with 50% glycerol, snap-frozen in liquid nitro- 
gen and stored at -80 °C. Mutant CasI-Cas2. complexes were cloned 
bysite-directed mutagenesisas described previously"*and purified as 
described above. Primers for sub-cloningare listed in Supplementary 
Table 1. The ihfA and ihfB genes were PCR amplified from E. coli (BL21) 
genomic DNA using the indicated primers in Supplementary Table 1, 
and subsequently cloned into Berkeley MacroLab ligation-independ- 
ent cloning (LIC) vectors 13K-HR (Addgene, plasmid 48318) and 13S-A 
(Addgene, plasmid 48323), respectively, as described previously”*°. 
The ihfA gene was cloned to code for an N-terminal His,-tagged IHF 
protein upon expression. Subsequentsteps for the purification ofthe 
IHFec and IHEB dimer complex were performed using HisPur Ni-NTA 
Resin (Thermo Fisher Scientific, 88222) as previously described”. DNA 
Pollll proteins were purified as described previously". 

For site-specific Casl-Cas2 biotinylation, an N-terminal LCTPSR 
formylglycine generating enzyme (FGE) recognition motif on Casl was, 
inserted by site-directed mutagenesis in plasmid pWUR871 with the 
primers shown in Supplementary Table 1, and co-expressed with FGE 
proteins (Addgene, plasmid 16132)*. The purified Casl-Cas2 complex 
was buffer-changed with 0.5 M sodium acetate (pH 5.5), labelled with 
EZ-Link Biotin-LC-Hydrazide (Thermo Fisher Scientific, 21340) and 
incubated overnight at room temperature. Labelled Cas1-Cas2 was 
purified by size-exclusion chromatography with Superdex 20010/300 
GL. Fractions were concentrated using Amicon Ultra-4 centrifugal 
filters MWCO 30K (Merck, UFC803024), pooled in storage buffer with 
50% glycerol, snap-frozen in liquid nitrogen and stored at -80 °C. 


DNA preparation 
Synthetic DNA oligonucleotides (Ella Biotech) were internally labelled 
with a monoreactive N-hydroxysuccinimide (NHS)-ester form of cya- 
nine dyes as donors (Cy3 mono-reactive NHS ester, GE Healthcare, 


PAI3101) or acceptors (CyS mono-reactive NHS ester, GE Healthcare, 
PA15101), or EZ-Link NHS-biotin (Thermo Fisher Scientific, 20217) 
at amino-C6-dT (amine modification with amino-modifier C6-T) 
(Supplementary Table 1). After labelling, the ssDNA strands were 
annealed in 20 mM Tris (pH 8.0),150 mM KCl and 5 mM MgCl, using 
a thermocycler (Bio-Rad) at -1°C per I-min cycle from 95 °C to 16°C, 
and thenstoredat4°C. 


Single-molecule TIRF imaging and data acquisition 

The fluorescentlabels Cy3and Cys were imaged using prism-typetotal 
internal reflection microscopy”. In brief, Cy3 was imaged through 
excitation by a532-nm diode laser (Compass 215M-50, Coherent). 
CySwas detected by FRET with Cy3, but, ifnecessary, CyS was directly 
excited usinga 640-nm solid-state laser (CUBE 640-100C, Coherent). 
Fluorescence signals from single molecules were collected througha 
60x water immersion objective (UPlanSApo, Olympus) withaninverted 
microscope (1X71, Olympus). Scattering of the 532-nmlaser beam was 
blocked with a 550-nm long-pass filter (LP03-532RU-25, SemRock). 
When the 640-nm laser was used, 640-nm laser scattering was blocked 
with a notch filter (633 + 12.5 nm, NFO3-633E-25, SemRock). Subse- 
quently, signals of Cy3 and CyS were spectrally split with a dichroic 
mirror (A*°"= 635 nm, Chroma) and imaged onto halves of anelectron- 
multiplying CCD camera (iXon 897, Andor Technology). 

Toeliminate non-specific surface adsorption of proteins and nucleic 
acids to a quartz surface (Finkenbeiner), piranha-etched slides were 
PEG- passivated over two rounds of PEGylation**, After assembly of 
amicrofluidic flow chamber, slides were incubated for 10 min with 
50 buffer (Tris-HCl pH 8.0 (Sigma-Aldrich, Trizma base, T6066) and 
50 mM NaCl) containing 5% Tween-20 (Sigma-Aldrich, P7949) to further 
improve slide quality". Next, the chamber was incubated with 20 pl 
of 0.1 mgmt" streptavidin (Invitrogen, S-888) for S min followed by a 
washing step with 100 pl CasI-Cas2 buffer (50 mM HEPES-NaOH pH 
75,50 mM KCI (Ambion, AM9530G) and 5 mM MgCl,). Biotinylated 
Casl-Cas2 at 0.2-1 nM was specifically immobilized through biotin- 
streptavidin linkage by incubating the chamber for 5 min. Remaining 
unbound biotin-Casl-Cas2 was flushed away with 100 pl Cas1-Cas2 
imaging buffer (50 mM HEPES-NaOH pH 7.5, 50 mM KCI, SmM MgCl, 
(Ambion, AM9640G), glucose oxidase (Sigma, G2133), 4 mg ml 
catalase (Roche, 10106810001) and 1 mM Trolox ((+/-)-6-hydroxy- 
2,5,7,8-tetramethylchromane-2-carboxylic acid, Sigma, 238813).Immo- 
bilized Casl-Cas2 was incubated with 0.5-2nM labelled DNA atroom 
temperature (231°C) for the indicated times. 

To visualize the dynamics of prespacer and prespacer precursor 
DNA binding on Casl~Cas2, Cy3 molecules were excited on an area of 
50 x 50 um? with a green laser (532 nm) at 28% of the full laser power 
(9 mW), and the time resolution was set to 0.1. Under theseimaging 
conditions we obtained a high signal-to-noise ratio that allowed us 
to visualize kinetic intermediates while imaging over time periods of 
3.5 min. Under these conditions, photobleaching of the donor and 
acceptor dyes during our observation time was minimized. 


Single-molecule data analysis 

Aseries of CCD images were acquired with laboratory-made software 
at atime resolution of 0.1. Fluorescence time traces were extracted 
with analgorithmwritten in DL (ITT Visual Information Solutions) that 
picked fluorescence spots above a threshold with a defined Gaussian 
profile. The extracted time traces were analysed using custom writ- 
ten MATLAB (MathWorks) algorithms. FRET efficiency was defined 
as the ratio between the acceptor intensity and the sum of the accep- 
tor and donor intensities. To determine the dissociation rate (ky,), 
the start and end of each binding event were determined (Fig. 1b). 
Thestart of each event was marked by an abrupt increase in the flu- 
orescence signal, whereas the end of each event was marked by an 
abrupt decrease (Fig. 1b). Selecting the start and end of each event 
yielded the duration of each event, which was plotted in a histogram. 


These dwell-time distributions were fitted with a single-exponential 
decay using maximum-likelihood estimations (Extended Data Figs. 1c, 
i,m, 3c). This fit yielded the average dwell-time (t4q), which was then 
converted to the dissociation frequency (rate) (kya =1/tog). The 95% Cls 
(errors) of the dissociation frequencies were obtained by empirical 
bootstrap analysis”. 

The binding frequency was determined by measuring the time from 
flow-in of the DNA substrate to the occurrence of the first bindingevent. 
‘These characteristic times were plotted asa cumulative histogram and 
fitted witha single-exponential decay using maximum-likelihood esti- 
mation (Extended Data Figs. 1h, j,1, 2d, 3b). This fit yielded theaverage 
arrival time (r,,) which was then converted to the binding frequency 
(kon=1/(os0)), here cis the concentration of DNA. The95% confidence 
intervals (errors) of the binding frequencies were obtained by empiri- 
cal bootstrap analysis”. 

To obtain survival rates of the long-lived population, fluorescently 
labelled prespacer DNA was incubated for 10 min in the microfluidic 
chamber. After washing the remaining unbound molecules (¢=0),the 
bound population was tracked over a time of 45 min. To avoid pho- 
tobleaching, shortsnapshots of 10 frames were taken over 20 fields of 
view at each time point, providing the average number of molecules 
bound to Casl-Cas2. For the subsequent analysis, the number of lost 
moleculesat each timepoint was subtracted from the total number of 
molecules bound at ¢= 0. This yielded survival rate curves that were 
fitted witha single-exponential decay (Extended Data Fig. In). 


Quantification and statistical analysis 

Histograms and fits were generated using OriginPro (OriginLab). The 
averages and errors representing the number of bound molecules 
(Fig. If, Extended Data Fig. 2b), the survival time (Fig. 1g, Extended 
Data Fig. In), the cumulative probability of the molecule arrival 
time (Figs. 1d, h, i, 2a, d, Extended Data Figs. 1h, , |,2d, 3b, d), theaver- 
age dwell time of events (Figs. e,h, i, 2b, Extended Data Figs. 1c, i, m, 
3c, e) and the FRET population analysis (Extended Data Figs. 1b, f,g, 
p,q, 2f, g, 6e, 8d) encompass a minimum of three replicates (n). The 
errors represents.e.m., which was definedass.e.m=0/Vn. Theaverages 
and errors displayed in the figures were obtained through bootstrap 
analysis. Inbrief, for bootstrap analysis,10* datasets weregenerated by 
random sampling with replacement from the original dataset. Each of 
these datasets was fitted with the respective fit (indicatedin the figure 
legend) and then used to calculate the average and 95% CI, which was 
defined as C1(95) = 1.960. 


Invitro integration assay 
Integration assays with fluorescently labelled prespacer and prespacer 
precursor DNA were performed using 200 nM Casl-Cas2complex, 500 
nMIHF dimer, 20 nM prespacer or prespacer precursor DNA and 40 
nM CRISPR DNA in afinal reaction with Casl-Cas2 integration buffer 
(50 mM HEPES-NaOH pH7.5, 50 mMKCI, SmM MgCl, and 5%PEG8000 
(Sigma-Aldrich, P2139). The Cas1-Cas2. complex was incubated with 
prespacer or prespacer precursor DNA to allowcomplex formation for 
30 min at room temperature. Subsequently, IHF dimer was incubated 
with CRISPR DNA in aseparate tube. The reaction was activated by 
adding the prespacer-Casl-Cas2 complex to the IHF-CRISPR DNA 
mix, and incubated for the indicated times at 37 °C. For full-siteintegra- 
tion experiments of half-site intermediate constructs, 20 nM half-site 
integration products were incubated with $00 nM IHF dimer for 30 min 
atroom temperature, followed by the addition of 200 nM CasI-Cas2. 
This mixture was incubated at room temperature for 10 min, allowing 
Casl-Cas2and IHF dimer to assemble with the DNA. Next, 100 nMDNA 
Pollll core, 2.5 UExoT (New England Biolabs, M0265) or no enzyme 
was added, followed by incubation at 37 °C for the indicated times. 
For trimming-driven integration assays, prespacer-Casl-Cas2.and 
IHF-CRISPR DNA mixtures were combined and incubated at room 
temperature for 10 min. The combined mixtures were added with 


100 nM DNA Pollll core or 2.5 U ExoT, followed by incubation at 37°C 
for the indicated times. To quench the reactions, DNA loading buffer 
(final concentration 12.5 mM EDTA (Ambion, AM9261) and 47.5% for- 
mamide (Roche, 11814320001)) wasadded and thoroughly mixed with 
samples. The samples were heated at 95 °C for10 minandimmediately 
loaded and run on 15 x 15-cm?-sized 7 M urea (Sigma-Aldrich, 316830) 
denaturing 9% or 12% polyacrylamide 1x Tris-borate-EDTA (TBE) gels. 
The gels were pre-runfor 2handrun for 2-3hat370 Vin 0.5* TBE buffer 
(Promega, V4251). Fluorescence signals from gels were analysed in an 
Amersham Typhoon biomolecular imager. 


Invitro trimming assay 

Forin vitro trimming assays,20 nM prespacer or prespacer precursor 
DNA was incubated with 200 nM Cas1-Cas2in Cas1-Cas2 integration 
buffer containing 10 mM DTT and10% PEG8000 atroom temperature 
for 30 min, and then supplied with each exonuclease in the indicated 
amounts: 1U DNA Poll (New England Biolabs, M0209), 100 nM DNA. 
Pollll core (for holoenzyme, 33.3nM clamp loader, 200 nMB-clamp and 
10 nM DnaBC helicase were also added, but without single-stranded 
DNA binding protein (SSB)), 1 U Exol (New England Biolabs, M0293), 
1URecBCD (New England Biolabs, M0345), 0.5U ExoVIl (New England 
Biolabs, M0379) or1U ExoT. After incubation at 37 °C forthe indicated 
times, the reaction was quenched with DNA -loading buffer and subse- 
quently heated at 95 °C for 10 min for 15 x 15-cm*-sized 7 Murea dena- 
turing 20% TBE-PAGE. Fluorescence signals from gels were analysed 
in an Amersham Typhoon biomolecularimager. 


Electromobility shift assay 

Binding assays were performed in buffer containing 50 mM HEPES- 
NaOH pH 7.5, 50 mM KCl, 5 mM MgClz, 5% PEGS000, 5% glycerol and 
1mM DTT without (Extended Data Fig. 1r) or with (Extended Data 
Fig. 3g) 25 mM EDTA. Each reaction contained 10 nM dye-labelled 
prespacer DNAs or ssDNAsat increasing concentrations (0-200 nM) 
of CasI-Cas2. The reactions were incubated at room temperature for 
30 min and resolved at 4 °C on 4% native agarose (Promega, V3121) 
gels containing 1x Tris-acetate-EDTA (TAE) buffer (Promega, H5231). 
Fluorescence signals from gels were analysed in an Amersham Typhoon 
biomolecular imager. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 

All data generated or analysed during this study are either included 
in this Article and its Supplementary Information or available from 
the corresponding authors on request. Source Data for Figs. 1, 2and 
Extended Data Figs. 1-3, 6, 8 are provided with the paper. 
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Extended Data Fig.1|Single-molecule and biochemical analysisof the effect 
of 3-overhang length.a, Schematic of the single-molecule TIRF set-up that was 
used for measuring Cas!-Cas2 binding to canonical or precursor prespacer 
DNA.b, FRET efficiency histograms of binding events observed for canonical 
and precursor prespacer DNAs with3”-overhangs of various lengths. ¢, Dwell- 
time (A) distributions andaverage binding time (r..) determination for 
canonical and precursor prespacer DNAS with 3overhangs of various lengths. 
4d, Left, representative time trace froma binding assay in the absence of Casl- 
Cas2. DNA wasadded ate=55. Right, image of afield of view with Cy3 signals on 
the leftand CyS signals on the right. The image wasrecorded Iminafter the 
addition of DNA. e, Schematics of precursor prespacer DNAs with different 
labelling positions. , FRET efficiency histograms of individual precursor 
prespacer DNAs with different labelling positions bound to 

Casi-Cas2. g, FRET distribution and fractions of precursor prespacer DNAS 
fromasingle-molecule competition assay using precursor prespacer DNAS 
with different labelling positions. Histograms were obtained by incubating 
equal concentrations of precursor prespacer DNAs. To track the stably bound 
population, the flow chamber was washed and the fluorescence signalsof the 
remaining population were measured. h, Cumulative distribution of the arrival 
times for binding events and k,, for precursor prespacer DNAs with different 
labelling positions. i, Dwell-time distributions and k,y of binding events for 
precursor prespacer DNAswith distinct labelling positions.j, Cumulative 
probability of the arrival times for precursor prespacer DNAs with 3-overhangs 
of various lengths. k, Schematics of precursor prespacer DNAswith 
3-overhangs of various lengths. Each DNA construct consists of a23-bp central 
duplex and S-nt3-overhangs at both ends, whichare further extended withN 
number of single-stranded deoxythymidine (dT) nucleotides. Bothstrands 
were labelled with a Cy3 fluorophore at the Send of the top strandand aCys 


fluorophoreat the 16th nucleotide (T) from the Send. , Cumulative 
distribution of the arrival times for binding events and, or precursor 
prespacer substrates with 3-overhangs of variouslengths.m, Dwell-time 
distributions of binding events and kar for precursor prespacer DNAs with 
overhangs of various lengths. n, Survival probability of stably bound 
substrates with 3-overhangs of various lengths. The solid lines represent 
single-exponential fits using maximumlikelihood estimation. 0, Schematics of 
canonical and precursor prespacer DNAs with different labelling positions and 
3-overhanglengths for a single-molecule competition experiment. p,q, FRET 
distributions (p) and fractions (q) of each FRET population rom single- 
molecule competition experiments for canonical and precursor prespacer 
DNAs with 3-overhangs of various lengths. ‘Before washing’ includesboth 
transient and stably bound molecules;‘after washing’ includes only the stably 
bound molecules. r, EMSAson various canonicaland precursor prespacer DNA 
substrates with increasing amounts of wild-type CasI-Cas2. The top and 
bottom strands were labelled at the Send with Cy3.and Cys, respectively. 
CasI-Cas2-bound and unbound precursor prespacer DNAsare indicated onthe 
right. Forb,f,g, p, solid lines represent Gaussian its; the centre of each peak 
corresponds to the predetermined position of each individual constructing, 

p. Fore, h-j,1,m, solid lines represents single-exponential fits (maximum- 
likelihood estimation) that were used to determine the binding frequency (Kae) 
(hj, and dissociation rate (k,n) (i,m). For the cumulative probability ofthe 
arrival times for k,,.bar plotsand dwelltimes or k,,bar plots, data are 

‘mean 95% Cl, obtained by bootstrap analysis of asingle replicate with n>300 
(hj, or 22500 (c,i,m) individual molecules. For the FRET fractions, dataare 
‘mean: s.e.m. from threeindependent measurements(n=3) withn>5,000 
molecules for each measurement (g, q).Dataare representative of three 
replicates with similar results (b, ¢,f,h-j, 
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Extended Data Fig. 2|Single-moleculeanalysis of the effect ofthe PAM 
positionand sequence.a, Schematics of precursor prespacer DNA withthe 
PAM sequenceat different positionsin the 3-overhang. b, Average number of 
‘molecules bound per field of viewaftera30-min incubation with precursor 
prespacer DNAs, Dataaremean +s.e.m. (n=3). Representative CCD images 
{acceptor channel) are included asinsets. Scale bars, 5 um. ¢, Structural 
‘comparison of Casl-Cas2 precursor prespacer complexes. CasI-Cas2in 
complex witha non-PAM (S‘TTT3’, orange)-containing substrate with 10-nt 
3-overhangs (PDB: SDL));a PAM(S-CTT-3,,red)-containing substrate with 8-nt 
overhangs (PDB: SDQZ);and anon-PAM substrate with S-nt3"overhangs that 
end with aT (PDB: SDSS; left) or aC (PDB: SDSS; right). The C-terminal proline- 
rich tail of Castb and the flexible internal lid-like loop region of Caslaare 
highlightedin blueand magenta, respectively. The magnified image on the 
right represents the molecular architecture of the PAM-recognizing residues of 
Casi-Cas2, The residues of the PAMsequence (Cs, red; Ta», blue;Ty,orange) 
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are coloured, together with the PAM-interacting residues of Casl. 
4d, Cumulative probability of the arrival times for precursor prespacer DNAs 
with different PAMsequences. A single-exponential fit (solid line) wasused to 
determine the binding frequency (k,.).Dataaremean+95% Clobtainedby 
bootstrap analysis froma single replicate withn > 300 individual molecules, 

€, Schematics of the design of precursor prespacer DNAs with different PAM 
sequences. fg, FRET distributions (f)and fractions (g)ofeach FRET 
population from single-molecule competition experiments for different 
sequencesat the PAM position. Each population was fitted witha Gaussian 
distribution (solid line); the centre of each peak corresponds to the 
predetermined position of each individual construct (f). Forthe fractions, data 
aremean-+s.e.m. fromthree independent measurements (n=3) with n> 5,000 
molecules per each measurement (g).Dataare representative of three 
replicates with similarresults(d,f,g). 


a f = 
3PAMs 4 PANS 
Ss ererrerrms Semerrrerrrerrrcrest 
2 r 
i HAL uta 
abo” 200” 3000” 100 200 ab90 100 200  ad00 100 200 | 3000100200200 
Time(s) ‘Time (s) ‘Time (8) ‘Time (s) Time (s) 
opaW ot Pam 02 PAWS 3 PAMs o.4 Paws 
o 7 "qo zi00 7 wo 200 70 140 20d 70 140 2100 70” 40 2t0 
© Arial tine (3) Arrival time () rival ie (6) ‘arial tna (5) Aivaltne () 
0. 
g 
2 yg] ryr0.282 0028 rec04220018 £049 £0018 ryt042 £001 8 rys0.45 2 0.028 
Eo. 
= oPaM + PAM 2PAMs 4 PAM 
Bo 1 1 
1 2 8 a ee ar en ee a ae 12 8 
Dwell time (s) Dwell time (s) ‘Dwell time (s) Dwell time (5) ‘Dwell time (5) 
f ‘Short ssDNA with PAM Long ssDNA with PAM 
d e — 
os. 4s y 
Stow binging FasL bnding 
eo 
= 0.150 30 = 
z t 
Foor. 15 Rapid fusion 
004 09 
20 30 40 70 90 2 30 40 70 90 
iene ered PAM binding though the C-terminal tal of Castb In Cast-Cas2 
‘terminal ti 
g 23nt 
ren oO 
ds “ren 
4025 50 100 200 400: Cast-Cas2 (om) 0 10 25 50100 700 400 :Cast-Cas2 (aM) 
oy | cost casz-nound (200A) = = = / cast-casz-bound (ssDNA) 
4 | 
--— 1+ Unbound (ssDNA) oo" + Unbound (ss0NA) 
Pp tc es See | 
oe os 


cya, 
‘ Yemen s 
TTT 


oa 


ren 


3, * 
cys cys 
9490. 25 50 100 700 400 -Cast-cas? (M) 010 25 50 100 200 400 :Cast-cas2 (aM) 
= " |. cast-casz-bound (ssDNA) = |. cast-Cas2-bound (ss0NA) 
Fee oes canz-pound @80NA) a 5 Be ee em ee cisecotns abn) 
L, Unbound (dsDNA) 
-=—— F Unboua (ss0NA) Le Unbound (as0WvA) 
se FF Ontound fsb 
a as 


Extended DataFig.3|See next page for caption. 


Extended DataFig.3|PAM-dependent ssDNA capture by Casi-Cas2. 
a, Representative time traces of asingle CasI-Cas2.complex binding to ssDNAS 
that contain different numbers of PAMsites. DNA was added at ¢=5s. The insets 
showa snapshot of the field of view taken after a10-min incubation. 

b, Cumulative probability of the arrival times with asingle-exponential fit 
(solid line) that was used to determine the binding frequency (k.,).Dataare 
mean +95% CI, obtained by bootstrap analysis froma single replicate with 
‘n2300 individual molecules. ¢, Dwell-time distributions of bindingevents for 
ssDNAsthat contain different numbers of PAMsites. Average dwell times (tax) 
are mean + 95% Cl, obtained by bootstrap analysisofasingle replicate with 
‘2500 individual molecules. d, ka, of ssDNA substrates of various lengths, 
containingone PAMsite. Dataare mean +95% Cl, obtained by bootstrap 
analysis froma single replicate with n> 300 individual molecules. €, kof 
ssDNA substrates of various lengths, containing one PAMsite. Dataare 

mean +95% CI, obtained by bootstrap analysis of single replicate with n>500 


individual molecules. f, Model of a facilitated diffusion mechanism for 
PAM-dependent ssDNA binding by Casl-Cas2. Casl-Cas2bindsa non-specific 
(non-PAM) region on ssDNA, which is followed by rapid facilitated diffusion and 
PAM recognition. Although the diffusive movement cannotbe directly 
observed with the timeresolution of our single-molecule assay (0.15), the 
effects canbe seenwhen the ssDNA substrate is extended (seed). When the 
lengthisincreased, the measured binding frequency increases (commonly 
referred toastheantennaeffect), which suggests that Casl-Cas2 uses 
facilitated diffusion tolocate PAM sequences. g, EMSAs on various ssDNA and 
dsDNA substrates with increasing amounts of wild type Casl-Cas2. Top, EMSAS 
with ssDNAs without (Cy3) or with (CyS) aPAM sequence. Bottom, EMSAs witha 
precursor prespacer DNAsubstrate thatis annealed, or withtwo ssDNAsadded 
simultaneously. The bands that correspond to the boundand unbound 
fractions areindicated on theright. For gel source data, seeSupplementary 
Fig. 1. Dataare representative of three replicates with similar results (a-e,g). 
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Extended Data Fig. 4 |In vitrointegration assay with precursor prespacer 
DNAswith3"overhangs of various lengths. In vitro integrationassay usinga 
linear CRISPR DNA and canonical or precursor prespacer DNAs with 
3-overhangs of different lengths. Full-site integration of amature prespacer 
DNA (28nt) resultsina 78-nt leader side integration (Li) productanda13-nt 
spacer-side integration (S+1) product. The top and bottomstrands ofthe 
canonical or precursor prespacer DNA substrates were labelled with CySand 


Cy3, respectively. Samples wererun ona 7 Murea denaturing 20% TBE-PAGE, 
after which images were collected witha Typhoon scanner. Only those 
precursor prespacerswith the canonicalsize of Snt (S‘TTTTC3’)in the 
3-overhangis) were efficiently incorporated into the CRISPR locus to yield 
leader-side (spacer-side) integration products. For gel source data, see 
Supplementary Fig. 1. Dataare representative of three replicates withsimilar 
results. 
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Extended Data Fig. 5| In vitro trimming assay with candidate exonucleases _withCy3and Cys. Samples were collected after the indicated times of 


for3-overhang trimming. a, In vitro trimming assay witha precursor incubation with exonucleases, Samples were run ona 7 Murea denaturing 20% 
prespacer DNAin the presence of various3’-5’exonucleases. Theprecursor __‘TBE-PAGE, after which images were collected witha Typhoon scanner. The 
prespacer DNAS used were the same as in Fig.3a.b, In vitro assay for PAM- canonical size (28 nt) of trimmed strandsis indicated with red arrowheads. For 


dependent trimming.c,d, Invitro trimming assay withwild-typeCasl-Cas2or __gel source data, see Supplementary Fig. 1. Data are representative of three 
‘mutant Casl(Q287A/1291G)-Cas2in the presence of DNAPollllcore(e)orEXoT _replicateswithsimilarresults. 
(d). The two strands of the precursor prespacer DNA wereinternally labelled 
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Extended Data Fig. 6|Single-moleculeleader-sideintegrationassayforthe products. The Send of the top strand was labelled with Cy3. Integration 
PAM-deficientend of asymmetrically trimmed precursor prespacerDNAs. _of the 3-endof the bottomsstrand at the leader side exhibits high FRET, and 

a, Schematicof the single-molecule FRET assay that was usedtoobservethe _integrationof the3-end of the top strand shows low FRET. d, Fractions of high- 
orientation of integrated spacers, Biotinylated CRISPRDNAWaslabelledwith _andlow-FRETevents after theintegration reaction. Dataaremean-+s.e.m. from 
CySin the repeat region (Sntaway fromthe leader-repeatjunction).Precursor three independent measurements (n=3) with n>3,000 molecules for each 
prespacer DNA was labelled withCy3 attheS-end of thetopstrand.b, Expected measurement. e, FRET efficiency histograms of precursor prespacer DNAS 
FRET fromthe single-molecule assay based on structural modelling (PDB: witha 3overhang length that is optimal (28-nt) for the PAM-deficientstrand 
‘SWFE), Representative CCD images in donor (green box) andacceptor (red andnon-optimal for the PAM-containing strand. Solidlines represent Gaussian 
box) channelsare included as insetsand indicated with representative highand _ fitstoobtain the high and low FRET populations. Dataare representative of 
low FRET states.c, smFRET design forassessingtheorientationofintegrated _threerreplicates with similar results. 
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Extended Data Fig. 7 | Maturation and integration of the PAM-containing or ExoT (d).Unreacted half-site(H-S) intermediatesand disintegrated products 
end inhalf-site intermediates. a, Design for in vitro and single-molecule are shown in the Cy3 image. Spacer-side integration products and processed 
trimming-driven integration assays. The last three backbone phosphodiester _ top prespacer strands are shown in the CySimages. For clarity, the bottom part 
bonds from the 3-end of CRISPR DNA were modified with PTO (purple) to (below SO nt) of the Cys image was separated from the top part (around 


prevent degradation by3/-S’exonucleases.b, Schematicofthesubstratesused 70-130 nt) and adjusted witha different contrast. For gel source data, see 
for the in vitro trimming-driven full-site integration assay.¢,d,Gelimagesfrom — Supplementary Fig.1. 
the in vitro trimming-driven fullsite integration assay with DNA Pollll core (c) 
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Extended Data Fig.8 |Reconstitution of trimming-driven integration. 

a, Design of in vitroand single-molecule trimming-driven integration assays, 
The last three backbone phosphodiester bonds fromthe 3end of CRISPR DNA 


were modified with PTO (purple) to prevent degradation by 3-5’ exonucleases. 


b, Representative gelimages from thein vitrotrimming-driven integration 
assay withDNA Pollll core. The contrast of areas of spacer-sideand leader-side 
integration products was adjusted for optimal visibility. For gel source data, 
see Supplementary Fig. 1.¢,d, Single-molecule assay forbiased integration of 
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PAM-containing precursor prespacer DNA.c, Schematic of the experimental 
procedure used for trimming-driven integration assaysat the single-molecule 
level. d, FRET efficiency histograms from the trimming-driven integration 
assay. The Cy3-labelled top strand of precursor prespacer DNAS hadeither CGT 
(non-PAM) or CTT (PAM).Solid lines represent Gaussian fits to obtain the high 
and low FRET populations. The bar plot displays fractions of high and low FRET 
populationsafter theintegration reaction. Dataare presented as mean +s.e.m. 
{n=6).Dataare representative of three replicates with similar results. 
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Photosynthetic organisms have developed various light-harvesting systems to 
adapt to their environments'. Phycobilisomes are large light-harvesting protein 
complexes found in cyanobacteria and red algae” *, although how the energies of 

the chromophores within these complexes are modulated by their environmentis 
unclear. Here we report the cryo-electron microscopy structure ofa 14.7-megadalton 
phycobilisome with a hemiellipsoidal shape from the red alga Porphyridium 
purpureum. Within this complex we determine the structures of 706 protein subunits, 
including 528 phycoerythrin, 72 phycocyanin, 46 allophycocyanin and 60 linker 
proteins. In addition, 1,598 chromophores are resolved comprising 1,430 
phycoerythrobilin, 48 phycourobilin and 120 phycocyanobilin molecules. The 
markedly improved resolution of our structure compared with that of the 
phycobilisome of Griffithsia pacifica’ enabled us to build an accurate atomic model of 
the P_purpureum phycobilisome system. The model reveals how the linker proteins 
affect the microenvironment of the chromophores, and suggests that interactions of 
the aromaticaminoacids of the linker proteins with the chromophores may be akey 


factor in fine-tuning the energy states of the chromophores to ensure the efficient 
unidirectional transfer of energy. 


Light absorption is the first step of photosynthesis. The membrane- 
extrinsic soluble phycobilisomes (PBSs) are responsible for the 
majority of light capture in cyanobacteria and red algae". PBSs are 
composed of phycobiliproteins (PBPs) and linker proteins’, and sun- 
lightisabsorbed and the energy transferred by open-chain tetrapyrrole 
chromophores that covalently bind to PBPsand some linker proteins**, 
Aheterodimer of two different PBP subunits (a- and fs subunits) assem- 
bles into a ring-shaped (a), trimer, which servesas the basic unit for 
the PBS assembly’. The typical PBS consists of several peripheralrods 
surrounding the central core”. Solar photonic energy absorbed by the 
peripheral rods canbe rapidly funnelled to the core’ and eventually to 
the terminal emitters~chromophores of the core-membrane linker 
protein (Lex)*"” or allophycocyanin D (ApcD)'"—then transferred to 
photosystems andl? 

Four morphological types of PBS have been observed: hemidiscoi- 
dal, hemiellipsoidal”, block-type” and bundle-type. Werecently 
solved the structure of the block-shaped PBS from the red alga G. 
pacifica at 3.5 A resolution, which provided the detailed architecture 
of the intact PBS*, However, to ourknowledge there have been nohigh- 
resolution structures reported for other morphological types of PBS. 
Moreover, although we determined the locations of all chromophores 
of the G. pacifica PBS, owing to resolution limitations we could not 
reveal howthe energies of the bilinsare modulated by their surround- 
ings. To address these questions, we resolved the structure of a typi- 
cal hemiellipsoidal-shaped PBS" >” from Porphyridium purpureum, 
one of the few unicellular red algae and a widely used model alga” 
The resulting structure, determined by cryo-electron microscopy 


(cryo-EM) at 2.82 A resolution, reveals how the linker proteins affect 
the microenvironments of chromophores. 


Overall structure 

Intact PBSs were purified from P. purpureum and their subunit com- 
position and spectroscopic properties are shown in Extended Data 
Fig. 1. The overall resolution of the resulting structure is 2.82 A, with 
ahigher resolution of 2.68 A for the core region (Fig. la, Extended 
Data Fig. 2, Extended Data Table 1). Some long loops in L,,, which 
are absent in the electron microscopy map of the G. pacifica PBS, are 
clearly resolved in this reconstruction (Fig. 1b). The different types 
of bilins can be unambiguously assigned on the basis of the densities 
and the dihedral angles in combination with the results of published 
biochemicalanalysis®* that classified the phycocyanin in P. purpureum 
as R-phycocyanin, containing one phycoerythrobilin (PEB) and one 
phycocyanobilin (PCB) on the B subunit™ (Fig. 1c, Extended Data Fig. 3). 
In total, we built 706 protein subunits comprising 528 phycoerythrin 
subunits, 72 phycocyanin subunits, 46 allophycocyanin subunits and 
60 linker proteins, and we assigned 1,598 chromophores (Extended 
Data Table 2a). 

‘The two-fold symmetric PBS resembles an opened fan fromthe face 
viewand has anoval outline from the top view, withapproximate dimen- 
sions of 610 A (length), 390 A (height) and 380 A (thickness) (Fig. 1d). 
Itcontains 14 peripheral rods surrounding a pyramidal-shaped core 
(Extended Data Fig. 4a). The core contains one top cylinder (B) formed 
by twoallophycocyanin trimers stacked back to back, and two bottom 
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Fig.1| Overall architecture of the PBS from. purpureum.a, Local resolution 
‘map of the PBS from P, purpureum. The map was estimated with ResMap and 
generated in Chimera, b, The density (mesh) for thelinker protein Lex 
superimposed with its atomicmodel cartoon). Three loops (red)-V77to 
A98,F132toMI45 and L656 toL687~are clearly resolved.c, The densities 
(mesh) of the representative PCB, PEBand phycourobilins (PUB) bilins (stick 
representation) show their different coplanarities. d, Overall structure of the 
PBS shown in surface representation. The rodsareshown indifferentcolours, 
and the coreiscoloured green. All extra hexamers, individual monomersand B 


cylinders (A.and A’), each of which is assembled by one (af), allophy- 
cocyanin hexamer and one allophycocyanin trimer (Extended Data 
Fig. 4b). Therodsare categorized intotwo types accordingtotheir PBP 
composition: typel rods (Ra/Ra’, Rb/Rb’ and Re/Re’) are composed of 
both phycoerythrin and phycocyanin, whereas type ll rods (Rd/Rd’, 
Re/Re’, Rf/R&’ and Rg/Re’) are composed entirely of phycoerythrin 
(Extended Data Fig. 4d). Except for rods Rf/RF’ and Rg/Rg’, each of 
which contains two phycoerythrin hexamers in both PBSs, the number 
of phycoerythrin hexamers in each of the remaining rodsis one fewer 
in the P. purpureum PBS than in the G. pacifica PBS (Extended Data 
Fig. 4c, d). The P, purpureum PBSalso contains extra phycoerythrin 
hexamers; however, the hexamers He/He’, which are located near the 
surface of the outermost hexamer of rods Rb/Rb’ and Rc/Re’ in the 
G. pacifica PBS, are absent owing to the short lengths of these rods 
in the P._purpureum PBS (Fig. 1d, Extended Data Fig. 4c). There are 2 
individual phycoerythrin (af) monomers M1 (M1’) and 20 individual 
phycoerythrin B subunits S1-S10 (S1’-S10’) interspersed through- 
out the whole PBS (Fig. le). These components fill the empty spaces 
outside the rods, core and extra hexamers, and may stabilize the PBS. 
TheP purpureum PBS isaligned well with the G. pacifica PBS, buthasa 
smaller size owing to thereduced number of phycoerythrin hexamers, 
indicating a similar overall organization of the rods and core (Fig. 1d, 
Extended Data Fig. 4c). The molecular mass of the P.purpureum PBS is 


He Ha 


subunits, and linker proteins are coloured light green, wheatand red, 
respectively. The superimposed structure of the G. pacifica PBSiscoloured 
grey.e, Schematicshowing the distribution ofthe individual phycoerythrin 
(af) monomersand phycoerythrin B subunits. One half (MI andS1-S10) are 
coloured wheatand the other halfare grey. f, Structures ofall linker proteins 
shown in cartoon representation, from the same viewasind. Theyare thesame 
coloursas the hexamersin which they arelocated. Thesuperimpased 
structures of linker proteins of the G. pacifica PBSare coloured grey. 


14.7MDa, whichis less than that of the G. pacifica PBS (18.0 MDa) after 
considering the molecular masses of the chromophores. 

The skeleton formed by the linker proteins is very similar in the 
PBSs from both P. purpureum and G. pacifica (Fig. 1f), and both 
contain 17 types of linker protein, Superimposing the two PBSs indi- 
cate that they share very high structural conservation, except for the 
rod inker protein 1,6 (Extended Data Fig. 4e).L,6 of the P purpureum 
PBS contains the Pfam00427 domain, instead of the CBDy domain 
thats present inL,y6 of the G. pacifica PBS; thisisin accordancewith 
the overall tendency for the P. purpureum PBS to contain fewer bilins 
than the G. pacifica PBS. The roles of linker proteins in the assembly 
ofthe PBS~such as the sequential interactions between themtoform 
theskeleton, the extensive contacts between themand the hexamers, 
and the a-helix-mediated interactions between Ly: proteins and the 
core—are common between these two PBSs, highlighting their evolu- 
tionary conservation (Extended Data Fig. 5). 

‘Thereare 120 PCBs, 1,430 PEBsand 48 phycourobilinsinP, purpureum 
PBS (Extended Data Table 2a). The phycourobilin contentin the P. pur- 
pureum PBS is considerably lower than thatin the G. pacifica PBS; this, 
is because all phycourobilins in the P, purpureum PBS originate solely 
from the Lyy proteins, whereas in the G. pacifica PBS—besides the L,y 
proteins—all phycoerythrin f subunits also contain phycourobilin®. 
The lower phycourobilin contentand the reduced number of total bilins 
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Fig.2| Interactions of thelinker proteinsL,ysand Ly<s with chromophores 
IntherodRe.a, Bottom, overall structure of the rod Re with the hexamers 
shown in surface representation and the linker proteins shownin cartoon 
representation. Top, structureof the layer Rc3I. Proteinsand bilinsare shown 
incartoonand sphere representations, respectively. Three subunits are 
coloured differently andthe B82 PEBsare boxed andanalysed in detailinb-d. 
b, Theinteractions between the residue F80 andthe ilin yi, fromLgy4 with 
thebilin "gc, The interaction between F139 from L,y4 and the bilin "5g". 


in P purpureum PBS are consistent with the fact that P, purpureum live 
at the sea surface”, where the light intensity is higher compared with 
that beneath the sea surface where G. pacifica are found”. 


Interactions of L,ys with chromophores 

‘The (af), trimers of the phycocyanins, phycoerythrins andallophyco- 
cyanins have very similar ring-like structures, inwhich the central cav- 
ityisa common feature. Three 882 chromophores are located near to 
theinner cavity andare subjected to interactions with linker proteins™ 
(Fig. 2a). The trimers Rc31 and Rd31—from the type l rod Rcand the type 
Irod Rd, respectively—are used here to illustrate how the rod linkers 
Lay4and LS interact with the B82 chromophores (Fig.2).Each of the 
BS2 PEBs of Rc3I (denoted *'p™, ®1g8? and 'g™) is bound by two 
hydrogen bonds formed between the! nitrogen atoms of the pyrrole 
rings Band Cand the carboxyl group of the D85 residue of the B subu- 
nit (Fig. 2b-d), in agreement with the crystal structure of R-phyco- 
erythrin”, In particular, three aromatic residues of L,y4 (F80,F139 and 
F124) are located close to rings D of B°, &1p and °"B%", respec- 
tively, which will stabilize ring D and may expand the conjugation of 
the system owing to mm interactions (Fig. 2b-d). Moreover, an extra 
PEB from Lays (yj*2,)isadjacent—and very close-to the chromophore 
the distafice between their nearest two atoms is only2.9A 
(Fig. 2b). Therefore, the chromophore pair may further downgrade 
the energy level of 1B owing toexcited -state coupling”, with the 
result that *\g$? is probably at a lower energy level compared with 
that of *1g and *pS*, Notably, the trimer Rd3l in the type ll rod Rd 
displays similar structural features: one aromatic residue is close to 
each of the B82 PEBs, and an additional ilin fromthe linker Lyy5(y.) 
also resides close to the hasigs? (Extended Data Fig, 6a-d). Moreover, 
structural superimposition reveals that the L,y linker proteins in the 
‘outmost hexamers of various rods of PBSs from bothP. purpureumand 
G. pacificaalso have similar structures (Extended Data Fig. 7a, b). These 
key aromatic residues, and the cysteine residues that are used to link 
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4, The interaction between F124 from L,y4and the bilin""%B°.e, A focused 
view of the structure of the layer Rell showing the central triangle area. PBPS, 
thelinker protein, bilinsand residuesareshownin surface, cartoon, ball-and- 
stickand stick representations, respectively. Two B82PCBsareboxed and 
analysed in detail in fand g. f, The strong interaction between HS8fromLacl 
and the bilin "Bg, The interaction between Y104 from Lycland the bilin 
Relig fh "YB has the shortest distance (33.0 A, redline) between the rodand 
the core compared to *“4p* (57.14, grey line) and *“"g (55.2, grey line). 


the bilins, are well conserved in red algae (Extended Data Fig. 7c). 
The structural features of the interaction between Ly and B82 PEBs 
therefore suggest that BS?PEB isin the lowest energy: stateamongthe 
three B82 PEBs, and that energy migration through "p*" PEB could be 
the major route by which to pass energy through the rod. 


Interactions of L,,s with chromophores 

Energy is then transferred along the rods to the triangular area of 
the core-proximal hexamer (Fig. 2e). Two types of rod use different 
rod-core linkers to associate with the core. For Re—which uses the 
linker Lycl’—a heterocyclic residue (H58) from Lyc1*is located close to 
Retlgs?, with a minimum distance of 2.8 A (Fig. 2f). The pyrrole group 
of H58 can forma strong 1-1 interaction with rings B and C of 4p’? 
(ref. "). However, just one aromatic residue (Y104) from Lyc1‘ forms a 
relatively weak n-m interaction with ring D of ®“"g**—because ofthe 
longer distance (4.3 A) compared with that between H58 and ®*"B.* 
~and no aromatic residues interact with ®4g** * (Fig. 2e,g). Therefore, 
modified by the specific surroundings, "8. may be in the lowest: 
energy state among the three B82 chromophores. Moreover, ig! 
has the shortest distance to the core compared with "4B" and ®'p? 
(Fig. 2h), which further suggests that it may act as an energy-transit 
station, converging the energy absorbed by the rod and transferring 
ittothe core. Similar situations are found for another two rods of type 
land the three type I rodsin G. pacifica (Extended Data Fig. 7d). This 
histidine residueis conserved completely across different redalgal and 
cyanobacterial species, whichis indicative of its functionalimportance 
(Extended Data Fig. 7e). 

For Rd—which uses Ly.2—two aromatic residues from L,-2 form par- 
allel-displaced and T-shaped 1-1interactions with®“"B% respectively 
(Extended Data Fig. 6f). By contrast, each of another two B82 PEBs 
interacts with only one aromatic residue (Extended Data Fig. 6g, h). 
Superimposition of the Lyc proteins (Lyc2 and L_.3) of type Il rods 
from both. purpureum and G. pacificashows that two such aromatic 
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Fig.3| The bilins of ApeD and ApcF and their surroundingresidues.a, The 
surrounding residues of the bilin “a®!,,. The residuesandbilin are shown in 
stick and ball-and-stick representations, respectively.b, Theinteractions 
between “Band ApcF. The residuesand bilin are shown in tickand ball- 
and-stick representations, respectively. , “8%, isburied by acontiguous 
hydrophobic cap formed by the linker protein Lqy.Loyisshownincartoon 
representation in redand the cap isdisplayedin surface representation. "“p%, , 
isshown in sphere representation.d, The interactions between the capand 
“8° The hydrophobic residuesin the cap areshownin stickrepresentation 
inred. 


residues existin all cases (Extended Data Fig. 7f). Sequence alignment 
also shows that these aromatic residues are conserved, which suggests 
their importance in fine-tuning the energies of the bilins (Extended 
DataFig. 7g). 


Key chromophores in the core 

Previous studies have shown that three PCB chromophores in ApcD, 
ApcF and the a subunit of Ley (“oapen ee and “oj8*) perform 
critical functions in energy transfer in the core. However, why each of 
these chromophores have unique functions remains to be clarified. 
Here we use our high-resolution structure to analyse the immediate 
surroundings of these key core chromophores in their native states. 

Functionally, ApeD isthemain protein responsible for energy trans- 
fer to photosystem I". In our structure, two aromatic residues—WS7 
from ApeD and Y73 from the B subunit of the core trimer A3—form 
‘T-shaped and parallel-displaced n-interactions with “at. respec- 
tively; this enhances the tight fitting of ring D (Fig. 3a), which is consist- 
ent with the crystal structure of ApcD from Synechocystis PCC 6803 
(PDB: 4P05)”, Notably, we observed that W87 was surrounded by R83 
and R90 from ApcD and Y73 from the core A3, which provided two 
cation-minteractions and one T-shaped n-minteraction to WS7, respec- 
tively (Fig. 3a). It can therefore be inferred that the presence of these 
three residues is necessary to stabilize the orientation of W87, which 
iscritical for the conformation of “a... Inaddition, more related 
interactions between residues and “a),.,are extracted from thehigh- 
resolution structure of the entire PBS. The cationic side chain of 
R83 extends to the top of ring C of “a's; co, forming the cation-rinter- 
action‘, F59 and Y65 may contribute two additional n-interactions 
toring A of “acy (Fig. 3a). We then superimposed ApcDs from 
P.purpureum, G. pacificaand Synechocystis PCC 6803 with the a subu- 
nit of the core A3, and found that W87, F59 and Y65 are common toll 
ApeD molecules, although Y65 was replacedby V65inthe A3a subunit 
(Extended Data Fig. 8a). 

ApcF playsa crucial rolein energy migration to the terminal chromo- 
phore of Ly", Analysis of the PCB pocket of ApcF showed that the 
positively charged R89 formed one cation-m interaction with 
ring Cof BX. ,, and Y93 and Y97 formed one T-shaped w-1and one 


Fig.4| The conformation of PCBinLoy.a, Structural alignment of thea 
subunit of Ly (a!) inthe PBS and the recombinant « subunit of L.y,(PDB: 
4XX1).ato'and 4XXIare coloured red and pale yellow, respectively. The bilinin 
a'<sjsshowninball-and-stick representation. Two extraloops (V77-A98and 
F132-MI45) are displayed in sausage representation. b, Structural differences 
between a! Mand 4XXI. Two different conformationsof tryptophan (W154 in 
wand W164 in 4X1) are shown in stick representation in red and yellow. Y140 
and R144 from one loop are shown in surface representationinred. ¢, Steric 
clashingis observed between Y140 fromL.yand the ZZZasaconfiguration of 
*%q/* (grey), but isabsent between 140 and the ZZZssa configuration of “ais 
(red). d, Cryo-EM densities (mesh) of the bilins (stick) inthe «subunit of the 


core (Core PCB, yellow)anda'=(q!®, red)show the enhanced coplanarity of 
rings A andB of "ai 


parallel-displaced n-1interaction with ring, respectively. Moreover, 
R89, Y93 and Y97 interact with each other by either cation-r or 1 
interactions (Fig. 3b). The superimposition of ApcF molecules from 
P.purpureumand G. pacifica with the subunit fromthe core A2shows 
that these three residues existinthesame position inall proteins, sug- 
gesting their importance for the stability of the PCB (Extended Data 
Fig. 8b). Except for these common features, an aromatic residue from 
ApcF (F60) is located above ring A of “B%”_, (Fig. 3b); an aromatic 
residue was also found in the ApcF of G. pacifica (Y60), whereas this 
residueis replaced by L60in other B subunits (Extended Data Fig. 8b). 
This aromatic residue may therefore form additional -rinteractions 
with “By eand hence lower its energy. Another notable feature inour 
structure of the complete PBS is that Lc is directly involved in the 
interaction with “B,” _,. Several hydrophobic residues of L_,,arelocated 
at the ApcF/Lq, interface and within 4 A of i this creates a 
contiguous hydrophobic cap’ thatburies “9p”. (Hig. 3c, d, Extended 
Data Fig. 8c), which can enhance the stability of the conformation of 
Ber Theresidues presentaround the chromophores from other B 
subunits (2p*'and“2B5') are less hydrophobic (Extended Data Fig. 8d). 
Similarly, this hydrophobic cap is also found around the PCB of ApcF 
from the G. pacifica PBS (Extended Data Fig. 8e). 

The terminal chromophore PCBin Ly(*2a8*) exhibits fluorescence 
with similar emission wavelengths to those of the intact PBS, andisat 
alowerenergy than the upstream PCBs". Although the overall structure 
of the c-subunit domain of Lex (a) overlapped well with the recom- 
binant gy, (PDB: 4XXI)"° (Fig. 4a), some differences and new structural 
information are revealed in this study of the native PBS. In structure 
4XXI, two different conformations of W164 are found above ““aj°: 
oneis parallel toringD and the otherisnearly perpendicular”, However, 
in our structure, W154 at the same position displays only one confor- 
mation, parallel to ring D (Fig. 4b)—this indicates that the native Ley 
hasaunique preference for how theside chain of sucharesidueis posi- 
tioned. Moreover, compared with 4XXI, two extraloops(V77-A98 and 
F132-M145) were resolved in our at (Figs. 1c, Sa). Two residuesin this 
loop, Y140 and R144, are in direct contact (less than 4 A) with oj84 
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Fig. 5|Key bilinsin the core.a, Different views of the corewiththebilinsin 
different core layers shownin different colours. Bilinsina'®(a'="), ApcD 
(ApeD’) and ApcF (ApcF’) are coloured red, orange and brown, respectively. 
Bilinsin rodsarethesamecoloursasthe rodsin which they are located 
accordingto the colouring scheme in Fig. 1d. Bilinsare shownin stick 
representation, and key bilins areshownas thicker sticks. The numbers 


(Fig. 4b). In particular, the side chain of Y140 is oriented towards the 
inside of the “a8* pocket (Fig. 4b). In such a conformation there will 
exista steric clash between Y140 and theZZZasa configuration of “7qi86, 
thus providing a driving force for the formation ofthe 27Zssa col. 
figuration (Fig. 4c, Extended Data Fig. 8f). Comparison of a'™with five 
other similar subunits reveals that the orientation of this tyrosine in 
atorjs opposite to that in other subunits (Extended Data Fig. 8g). There- 
fore, Y140 of L,,.is another factor that causes “a}*° to uniquelyadopt 
theZZZssa geometry, which exhibits enhanced coplanarity of rings A 
and B compared with other PCBsin the a subunits of the core (Fig.4d, 
Extended Data Fig. 3c). 

Inaddition to “ohn, ee and “aiS*, the energy states of some 
other chromophores in the core are subjeteed to modification by the 
linker proteins, The shortest distance between rod Raand the core was, 
found between "g57/Ra and ““a§!(31 A) (Fig. 5a), which may facilitate 
energy transfer from Ra to the core. The bilin nearest to ZoS!is ft, 
rings Cand D of which forma parallel-displaced n-minteraction with 
F361 from L,, (Fig. 5b), and thus may mediate energy transfer to aye ae 
The energy absorbed by rods Rb and Re’ may travel through the core 
layer B1to cg", the nearest bilin to the basal cylinders (Fig. 5a). Bilin 
516% may play an essential role in this process because it is subjected 
toa parallel-displaced'm-m interaction with F850 from Ley (Fig, Sc). In 
thebasal cylinders, the two bilins °B51/7BS'and Aug’ *1B°whichare 
adjacent tothe bilins on ApcF/ApcF” and separated from them by34.8A 
and 25.7 A, respectively—have special microenvironments (Fig. 5a). 
2p !'is affected by the m-1 interactions with Y416 and F420 from Lex 
(Fig. 5d) and *!p*"is affected by several 1-1 interactions between its 
rings Cand D with ¥443,Y583andF610 from (Fig. Se)-this suggests 
that these two bilins may facilitate energy flow to “BX” ,/*7By7_.. The 
bilin pair “a4 and “a$!shows the shortest distance (86.14) between 
rod Rdand the core (Fig. Sa). The bilin “BS! may mediate further energy 
transfer because itis subjected to mm interactions with Y63 from L. 
(Fig. 5f).For Re, both hexamers Rel and Re2 attach to the corelayer Al; 
as such, the energy could flow from either Rel or Re2 to the core. The 
shortest distance between Re2 and the core is from “a$*/Re to “as! 
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indicate the distances (A) between the bilin pairs. b, Interaction between F361 
fromL,,andthebilin “p".¢, Interaction between F850 fromL.,andthe lin 
Sig% d, Interactions between Y416 and F420 fromLcy with the bilin “B". 
e Interactions between Y443, ¥583 and F610 frome with thebilin "p%. 
f Interaction between Y63 fromL, and thebilin “p!.g, The interaction 
betweenF454 fromL,.and thebilin “BS. 


(33A) (Fig. Sa). The energy could then travel viathe “'p", because this 
bilinhas the shortest distance to “a$!andis affected by F454 from Ley 
through two parallel-displaced w-m interactions with ringsC and D 
(Fig. 5g). Together, our results show that core linker proteins are exten- 
sively involved in the modulation of the energy states of core bilins to 
ensure the efficient unidirectional transfer of energy. These findings 
provide the framework for a detailed examination of energy transfer 
in future studies. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Preparation of phycobilisomes 

P.purpureum (From UTEX Culture Collection of Algae, UTEX 2757) was 
cultured in Bold NV: Erdshreiber (1:1) half-seawater medium, bubbled 
with sterilizing filtered air at 22°C, under a16 h:8 h light-dark cycle, 
with a white-light flux of about 37 mol photons per m? per second. 
Algal cells were collected by centrifugation for 10 min at 6,000g, and 
resuspended in Buffer A (0.65 M Na/KPO, buffer with 0.5 M sucrose 
and 10 mMEDTA, pH7.0) at 0.3 g of wet weight per ml. Thencells were 
homogenized twice at 4 °C usinga French Press (EmulsiFlex-C3, Aves- 
tin) at 4,000p.s.i., and phenylmethylsulfony! fluoride was added toa 
final concentration of mM. After 30 min of incubation with lauryldi- 
methylamine N-oxide (Sigma) (48 mg g ‘wet algal cells), debris and 
supernatant chlorophyll were removed by centrifugation at 20,000g 
for 30 min at18 °C. The middle aqueous violet solution was loaded in 
adiscontinuous sucrose gradient (2 ml of 0.5 M,2ml of 0.75M,2mlof 
1.0M,2mlof1.5M,1mlof2.0M, all in Buffer B: 0.75 MK/NaPO, buffer 
with10 mM EDTA, pH7.0) and spun at 120,000gfor 4 hat18 °C usinga 
‘SW41 rotor on Optima XPN-100 centrifuge (Beckman Coulter). Three 
visible bands were obtained after centrifugation and violet band Lis 
the main layer of intact PBSs (Extended Data Fig. 1a). 


Absorption and fluorescence spectrum measurement 
Absorption of the intact PBS was measured between 300-800 nm 
using an Ultrospec 2100 Pro ultraviolet-visible spectrophotometer 
(Biochrom). 

Fluorescence emission spectrawere recorded using aHitachiFL-4500 
fluorescence spectrophotometer at room temperature. After exciting 
at 450 nm, fluorescence emission was monitored from 500to700nm. 


Mass spectrometry analysis 

Mass spectrometry analysis was performed as previously described*. 
Inbrief, the purified PBS was separated by 412% Bis-Tris SDS-PAGE 
in MES buffer and the gel was stained with ZnSO, to detect bilin- 
containing proteins with ultraviolet light by Zn-enhanced fluores- 
cence. Then, four fluorescence bands with molecularmass greater than 
25kDa were excised forin-gel digestion and proteins were identified by 
mass spectrometry (Extended Data Fig. 1b). Theintact PBS complex in 
solution wasalso subjected to mass spectrometric analysis. Finally, all 
25 protein components of PBS were identified nthe samples (Extended 
Data Fig. le). 


Cryo-EM sample preparation and datacollection 
We used holey-carbon copper grids (Quantifoll R2/2, 400 mesh) cov- 
ered with homemade ultrathin carbon for cryo-EM sample preparation. 
Cryo-EM grids were prepared with Vitrobot Mark V (FEI Company) at 
16°C and 100% humidity. The grids were glow-discharged after adding 
50 plamylamine toaglass culture dishinthe plasmacleaner and vapour- 
ing into theair. We added 1.5 ul aliquot of protein witha concentration 
of 1.5 mg mI" to the grids and waited for 60 s, and then added 3.5 ll 
of 50 mM Tris buffer (pH 8.0) to the grids and quickly mixed with the 
sample twice to reduce the salt concentration. The grids were then blot- 
ted for 3.5 sand plunged into liquid ethane cooled by liquid nitrogen. 
The cryo-EM data were collected using a Titan Kiros Microscope 
(FEI) operated at avoltage of 300 kV and equipped witha Cs corrector, 
GIF Quantum energy filter (Gatan) anda K2 Summit direct electron 
detector (Gatan). A preset defocus range of -1.2 um to -2.2 um was 
used. All cryo-EM images were recorded ata nominal magnification of 
105,000x in super-resolution mode. Each stack was exposed for 5.6 
with an exposure time of 0.175 per frame and recorded as a movie of 


32 frames, resultingin thetotal dose rate of approximately 48 electrons 
per A*for each stack. GIF was set toaslit width of 20 eV. The data were 
collected automatically using the software AutoEMation**, The stacks 
weremotion-corrected with MotionCor2 and binned twofold, result- 
ing ina pixel size of 1.091 A per pixel. 


Cryo-EM data analysis 
A total of 16,218 micrographs were collected. Micrograph screening, 
manual particle picking and normalization were performed using 
EMAN2* and RELION3.0 beta*”*”. The contrast transfer function 
parameters of each micrograph were estimated using CTFFIND4*" and 
automatic particle picking;all the 2D, 3D classification, 3D refinement 
and local defocus calculation were performed with RELION3.0beta*™. 
The workflow of the data analysis is shown in Extended Data Fig. 2f. 
‘Two batches of data were collected and processed individually at the 
beginning. Particles were first manually picked from a small set of 
micrographs to produce templates for autopicking. Then particles 
were autopicked on all micrographs and manually screened to elimi- 
nate aggregation andice contamination. Finally 322,889 and 363,480 
particles were selected for the next 2D classification. After several 
rounds of 2D classification, 299,888 and 333,012 particles were left 
forthe 3D classification. After 3D classification, two classes from each 
dataset with good quality were selected for the final reconstruction. At 
this point, wecalculated the local defocus values for each particleand 
re-extracted particles from the dose-weighted micrographs®. Then two 
batches of data were merged to perform the 3D refinement. The final 
resolution of the 3D auto-refinement after post-processing was 2.82 A 
witha final particle number of 191,825 after imposing theC2symmetry. 
Application of amask for the core region during refinement further 
improved the resolution of this region to 2.68 A. Wealso applied local 
masks for each rod and extra phycoerythrin hexamer, which resulted in 
improved quality of local mapswith resolutions ranging between 2.77 
Aand 3.56 A. The maps for the target regions were extracted from the 
overall map by Chimera®, and the masks were created by RELION3.0 
beta’”~® Allresolutions were estimated with the gold-standard Fourier 
shell correlation 0.143 criterion with high-resolution noise substitution. 
Allthe local resolution maps were calculated using ResMap*. 


Model building and refinement 
We searched the published genome and transcriptome database of 
P.purpureum” against the 25 protein sequences of the G. pacifica PBS 
using the Basic Local Alignment Search Tool (BLAST). A total of 24 
homologues, includingeight PBP proteins and 16 linker proteins were 
obtained by this procedure, and these proteins are used as the candi- 
dates for model building. Local maps generated by the different masks 
described above were used to facilitate the model building process. 
Because the sequences of P. purpureum and G. pacifica have high 
homology with each other, we first docked the structures of the 
G.pacifica PBS proteins (PDB:SY6P)* into the electron microscopy maps 
using Chimera®. All the PBP proteins and most of the linker proteins 
were fitted well. Then the sequence assignments were guided by well- 
resolved bulky residues such as phenylalanine, tyrosine, tryptophan 
and arginine, and the sequences of the G. pacifica PBS were replaced 
with corresponding residues in the P. purpureum PBS in Coot and 
every residue was examined and manually adjusted to better fitin the 
map. Some of the L,y4 proteins could not be fitted well at the N-ter- 
minal region. We first built the C-terminal CBDy domainas described 
above, and then performed the de novo building in Coot with bulky 
residuesasland markers as most of these residues were clearly visible 
in our cryo-EM maps. The linker protein located at the centre cavity of 
the hexamer Hdis L,y6 in the G. pacifica PBS that contains the CBDy 
domain; however, the density at this region in the P, purpureum PBS 
shows recognizable structural features of the Pfam00427 domain. 
Therefore, we named this linker protein L,6 and first docked thestruc- 
ture of the Pfam00427 domain from L,2 into the density. By carefully 


examining the densities outside the Pfam00427 domain of Lx6,a YYW 
motif was unambiguously identified according tothe clear side-chain 
densities. Then we obtained the full-length sequence of Lx6 by search- 
ing the published genome and transcriptome database of P, purpureum 
for the protein containing both the Pfam00427 domain and the YYW 
motif. The sequence of Lx2 was replaced with corresponding 
residues in L,6 in Coot® and de novo atomic model building was 
conducted for the rest of the sequence in Coot. Finally, 25 protein 
sequences were identified and confirmed by good agreement of the 
side-chain information between the sequences and the density maps 
(Supplementary Table). 

The initial model was completed via iterative rounds of manual build- 
ing with Coot*and refinement with phenix.real_space_refine®**. During 
this process, each part of the whole PBS model corresponding to each 
local map was refined against the local map with secondary structure 
and geometry restraints to prevent overfitting. Then, all parts were 
merged intoawhole PBS model and this overall model was refined again 
against the overall 2.8 A map using phenix.real_space refine”. The 
atomic model was cross-validated according to previously described 
procedures*. In brief, atomsin the final model were randomly shifted 
byupto0.5A, and the new model was then refined against one of two 
half-maps generated during the final 3D reconstruction. FSC values 
were calculated between the map generated from the resulting model 
and the two half-maps, as well as the averaged map of two half-maps. 
We did not observe notable separation between FSCwork and FSCfree, 
indicating thatour model was notover-refined (Extended Data Fig. 2e).. 
The data collection, model refinement and validation statistics are 
presentedin Extended Data Tables1, 2b. The statistics of the geometries 
of themodels were generated using MolProbity®. All the figures were 
prepared in PyMOL (http://pymol.org) or Chimera®. The sequence 
alignments were performed by ClustalX2" and created by ESPript®. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

Theatomic coordinates have been deposited in the Protein Data Bank 
with the accession code 6KGX. The electron microscopy maps have 
been deposited in the Electron Microscopy Data Bank with accession 
codes EMD-9976 for the overall map and EMD-9977 through to EMD- 
9988 for the 12 local maps. The raw electron microscopy images used 
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Extended Data Fig. 1|Preparationand characterization of the PBS from 


P.purpureum. 


, Isolation of PBSsusingsucrose density gradient 


centrifugation. Three visible bandswere observed. Band listhesampleof 
PBSs used for single-particle analysis in this study. The purification of PBS was 
repeated independently at leastthree times with similar results.b, Analysis of 
the protein composition of band 1 by SDS-PAGE stained with ZnSO, toenable 
the detection of bilin-containing proteins with ultraviolet light by Zn-enhanced 
fluorescence. The bands of Ly, 5,7,8and PBPs identified by mass 
spectrometricanalysisare indicated. For gelsource data, see Supplementary 
Fig. 1. The purification and characterization ofthe protein composition was 
repeated independently atleast three times with similar results.¢, Absorption 
spectrum of band1 andthe PBS from G. pacifica, The peaksat 498 nm, 6201nm 
and 650 nm are from phycourobilins, PCBs of phycocyaninsand PCBs of 
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allophycocyanins, respectively. The peaksat 540 nmand S6Snmare from PEBs. 
The reduced absorptionof the P. purpureumPBS compared with the G. pacifica 
PBS at 498 nmindicates thatthe phycourobilin content of P. purpureumismuch 
lower thanthat of G. pacifica.d, Fluorescence emission spectra of the three 
bands. Emission maxima at 580 nm and 676 nm represent the disassembled 
phycoerythrin hexamer and the terminal emitter in the intact PBS, 
respectively. Band 1has anemission peak at 676 nm, band2.at S80 nmandband 
3 has two emission peaks at 676nm and S80nm, indicating thatband 1 contains 
intact PBSs, band2.contains freePBPs and band 3 contains partially 
disassembled PBSs.e, Results of themass spectrometric analysis of purified 
PBSs. Two batches of sample were analysed. The similar results confirmed the 
consistency of our purification method. 
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Extended DataFig.2|Cryo-EManalysisofthe PBSfromP. purpureum.a,A curves for the3D electron microscopy reconstructionsof the PBS. Blue curve, 
representative motion-corrected electron micrographofPBSs.Scalebar,50 __-FSCcurve forthe overallstructure: greencurve, FSC curvefor the coreregion 
‘nm. Tens of thousands of micrographs were collected with similar results. that was masked during refinement. e, FSC curves for thecross-validation of 


b, Fourierpower spectrumofthemicrographshowingthe Thonringextending _ theatomicmodel. Thesmall difference between work and free FSC curves 
102.254. Tens of thousands ofmicrographswere collected withsimilar results, suggested that the model was not overtitted. f, The workflow for the 2D and 3D 
¢. Typical good, reference-free 2D class averages from single-particle PBS classifications for cryo-EM data processing. The masking strategy for 

images. Scale bar, 20 nm. Morethan three rounds of 2D class average were dealing with sub-regions of PBS is enclosed within dashed lines. For details, see 
performed with similar results. d, Gold-standard Fouriershell correlation(FSC) _‘Cryo-EM data analysis’ in Methods. 
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Extended DataFig.3| Characterization of different typesofchromophore. 
a, Cryo-EM densities (mesh) of bilins (stick) bound toLgy4 inthe rod Rb, LaySin 
rod Rd, L,y7in the hexamer Ha andL,y8 in the rod Rd. b, The densities (mesh) of 
some PCBand PEBbilins (stick) in R-phycocyanins and phycoerythrins from 
rods Raand Rb to show their different coplanarities. All ofthe density maps of 
PCBbilins showeda very flat conformation of ringsB, CandD, consistentwith 
the carbon-carbon double bond between rings C and D in PCB that constraints 
the movement of ring D, so that ring D iscoplanar with the B-C plane. 


Conversely, most ofthe density maps of PEB displayed acurved conformation 
ofringsB, CandD owing to the single carbon-carbon bond between rings C 
and Din PEB thatallows the rotation of ringD, so that ring D deviates from the 
B-Cplane. However, some PEBs in phycocyanin alsoshoweda planar 
conformation-suchas™"p'" and "although toa lesser extent than that 
for atypical PCB molecule. ¢, Dihedral angles of three kinds of chromophore. 
The dihedralangles ®,, W;,;...are defined by the atoms NA-C(4)-C(5)-C(6), 
C4)-C(5)-C(6)-NB, NB-C(9)-C(10)-C(I1)... ete. 
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Extended Data Fig. 4 | Overall structure ofthe PBS from P. purpureum and 
comparison with that from 6. pacifica.a, Schematic diagram showing the 
organization ofthe rodsand the core fromtwo perpendicular views. The 
colouring scheme is the same asin ig. le. b, Structure ofthe core from two 
perpendicular views shows the assembly and arrangement of thecorelayers. 
¢, Overall structure of the PBS overlapped with the G. pacifica PBS displayed in 
surface representation from three perpendicular views. The additional 
hexamers in the G. pacifica PBS are coloured white and labelled.d, Schematic 
‘model of the PBS architecture. The connectionsbetween PBS componentsare 
shown. Darkand light colours show C2symmetric parts of rods. Darkand light 
salmon, phycoerythrin hexamers in rod; darkand light brown, extra 
phycoerythrin hexamers; darkand light forest green, phycocyanin hexamers; 
blue, allophycocyanin trimer; large rectangular box, Pfam00427 domains; 


% 


small rectangular box, Pfam01383 domains; square box, CBDy.e, Comparison 
oflinker proteins from P. purpureumwith those from G. pacifica. Structures of 
the 19 well-resolved linker proteins (magenta) are superimposed with those 
fromthe G. pacifica PBS (cyan). The linker proteinsshare very high structural 
conservation-such as the Pfam00427 domainin the rod-core linker (Lyc)I-3/ 
Lycl’-3% the rod linker (L,)1-3/Ls1'-3' and Ley/Lew’ the Pfam01383 domain inthe 
corelinker (L,)/L,’ and Lyl/ Lal’, the FASL domain in Lye 6/Lyc6’ and L,9/L,9”, the 
CBDy domain in Lyy4-S/Lay4’-5' and Lyy7-8/L,y7’-8’, the coiled-coil motifat 
the C terminiof Lyc2-3/Lac2’-3,and the longa-helix in the middle ofthe Lac4~S/ 
Lyc4’-5’. Note that L,6 from the P. purpureum PBSis different from Lay6 from the 
G. pacifica PBS, therefore they are not aligned. Domains of a! Pfam00427 
(00427), Pfam01383 (01383), CBDy, and FASLare labelled. 


Extended Data Fig. 5| Interactions betweenL,cproteins andthe core. 

a, Organization of Ly. proteins yc1-3/Lcl-3’ and the core. The grooves on the 
‘acsubunits that contact the linker helices are shownin red. b, Structural 
similarity and differences among Lect”, Lycl® and L scl. These rod-corelinkers 
aresuperimposed relative to the Pfam00427 domain. The helicesthat interact 
withthe core are boxed.¢, Structural similarity of Lyc2and Ly3, a8 


demonstrated by superimposition of the Pfam00427 domain atthe N termini 
and the coiled-coil motif at the C termini. The helicesinteracting withthe core 
are boxed.d-f, Interactions between the «'** subunit and the helices of Lacl* 
(@),Lyc2(€) and Lac3 (f). The residuesinvolved in the interactionof Lacproteins 
are coloured green and shown instick representation, The a" areshownin 
surface representation, and the residues involvedin the interactionare red. 
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range 


Extended Data Fig. 6 Interactionsof the linker proteins Lyysand.Lycs with 
chromophoresin therodRd.a, Bottom, overall structure of the rodRd with 
the hexamersshown insurface representationand the linker proteinsshownin 
cartoonrepresentation. Top, structure of the layer Rd3I Proteins andbilins are 
shownin cartoon and phere representations, respectively. Three subunits 
arecoloured differently and the 882PEBs are boxed and analysedin detail 
inb-d.b, The interactions between the residue ¥63and the bilin ys, fromLays 
with the bilin™"p®. ¢, The interaction between F122 fromL,ySand the bilin 


‘sig d, The interaction between F107 from L,ySand the bilin"p™. 

¢, Afocused view of thestructure of the layer Rdll showing the central triangle 
area. PBPs, the linker protein, bilins and residues are shownin surface, cartoon, 
ball-and-stickand stick representations, respectively. Three B82 PCBs are 
boxed and analysed in detail inf-h. F, The interactions between Y201and F207 
from ,-2and the bilin™"g™. g, The interaction between Y90 fromLy-2and the 
bilin ®@4p*. h, The interaction between 137 from Lyc2 and the bilin""B, 
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Extended DataFig.7| Comparisons oflinker proteins fromboth 
P.purpureumand G. pacifica.a,b, Structural alignment of L,y linker proteins 
inthe outmosthexamers of various rods from the P. purpureum PBS (a) andthe 
G. pacificaPBS (b). B82 PEBsand residues of L,y linker proteinsare shownin 
ball-and-stickand stick representations, respectively. Note thatan aromatic 
residue fromthe L,y linker ispresent near to each B82PEB to formn- 
interactions, and onebilin fromtheL,y linker(y,_ always provides additional 1. 
electrons to the conjugationsystem of the PEB. These aromatic residues 
and the bilins from Lyy linker proteinsare conserved inbothP. purpureumand 
G. pacifica.¢, Sequence alignment of Lyy4-S from P. purpureumandother red 
algae. Threearomatic residuesinteracting with the B82PEBsand thecysteine 
residuesused tolinkthebilins close to the B82PEBs are marked by stars. 
LRgamma4_ GP and LRgamma5.GP,Lay4-S from G, pacifica; PXF41621.1, 
y-subunit from Gracilariopsischorda; XP_005715244.1, y-subunit from 
Chondruscrispus;0SX79262, y-subunit from Porphyra umbilicalis; 
‘AAN39000.1, y-subunitfrom Griffichsia japonica; AXQOSI79.1, y-subunitfrom 
Agarophyton chilense.d, Structuralalignmentof Lycllinker proteins from P. 
purpureumand G. pacificain the phycocyanin hexamer showing the bilinB>” 


and the surroundings. The key histidine residue closeto the" PCBis 
conserved. , Sequence alignment of Ll from P,purpureumand otherred algal 
and cyanobacterial species. The key histidine residue close to the B82 PCBsis 
marked witha star. LRCI_GP, Lycl from G. pacifica; YP_009294673.1, Lacl from. 
red algal G. chorda;YP.007627464.1,Lyc1 from red algal C. crispus; 
YP_009413376.1, Lacl fromred algal P. umbilicalis;YP_009244497.1,Lclfrom 
redalgal A, chilense; WP_006617749.1,Lycl from cyanobacteria Arthraspira 
platensis; WP_009783358.1, Lycl from cyanobacteria Lyngbyasp. PCC 8106; 
WP.017720249.1, Lycl from cyanobacteria Oscillatoriasp. PCC 10802; 

WP 0715164541, Lyclfrom cyanobacteria Geitlerinemasp. PCC9228. 

£ Structural alignment of the Ly:2and L,c3 linker proteinsfromP. purpureum 
and G. pacificain the phycoerythrin hexamer proximal to the core showingthe 
bilin By’ andthe surroundings. Two aromatic residues near to the PEBare 
conserved in both. purpureumand G. pacifica. g, Sequence alignment of 

1 ,c2-3 from P. purpureumand other red algae. Two aromatic residues close to 
the B82 PEBsare marked with stars. Lyc2. GPandLyc3.GPare from G. pacifica. 
PXF39827.1, XP_005715536.Land OSX69059.1are fromG. chorda, C.crispus 
and P, umbilicalis, respectively. 
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Extended DataFig. 8 | Characterization of ApcD, ApcF andthe a subunit 
domain of Lg,.a, Magnified view of the superimposition of ApcD proteins from 
P. purpureum,G. pacifica(GP_ApcD), Synechocystis PCC 6803 (4P0S_ApcD)and 
the a subunit of thecore layer A3 (a CoreA3). Bilins and residues are shown in 
ball-and-stickand stick representations, respectively. Three aromaticresidues 
near to the PCBare conserved in all ApcD proteins, but notin thea subunit of 
the core layer A3. b, Magnified view of the superimposition of ApcF proteins 
fromP. purpureumand G. pacifica (GP_ApcF),andthe subunit of the coreA2 
(B.Corea2). “p™,,is shown inball-and-stick representation in sand. 
¢,Aschematic of interactionsbetween “py, ,and thehydrophobie cap. 
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d, Magnified view of the PCB pocket of ApcF (left), ““8;" (middle) and “BS 
(right). The protein isshown in surface representation and coloured onthe 
basis of amino acid hydrophobicity (see colour bar). The side chains of 
hydrophobic residues within$ A of the PCBareshowninstick representation. 
€, Magnified view of the structural alignment of the hydrophobic capsformed 
by Lo«proteinsfrom P. purpureumand G. pacifica. f, Schematic of the steric 
hindrance experienced by Y140/L,,,and the ZZasa configuration of "al". 

g, Structuralalignmentofa's, ApcD, ApcF, thea subunit(ApcA_A2)andthep 
subunit (ApcB_A2)in the core. The PCB pocketsare indicated in the magnified 
view ontheright. 
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Extended Data Table 1| Cryo-EM data collection, refinement and validation statistics 


Phycobilisome from P. 


purpureum 
(EMDB-9976) 
(PDB 6KGX) 

Data collection and processing 

Magnification 105,000 

Voltage (kV) 300 

Electron exposure (e-/A?) 48 

Defocus range (um) “1.2 ~ -2.2 

Pixel size (A) 1.091 

Symmetry imposed c2 

Initial particle images (no.) 686,369 

Final particle images (no.) 191,825 

Map resolution (A) 2.82 

FSC threshold 0.143 
Map resolution range (A) 24~7.4 


Refinement 
Initial model used (PDB code) 
Model resolution (A) 
FSC threshold 
Model resolution range (A) 
Map sharpening B factor (A2) 
Model composition 
Non-hydrogen atoms 
Protein residues 
Ligands 
B factors (A?) 
Protein 
Ligand 
R.ms. deviations 
Bond lengths (A) 
Bond angles (°) 
Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 


1014714 
125577 
1598 


57.95 
71.06 


0.009 
2.064 


1.79 
8.07 
0.53 


94.88 
5.11 
0.01 


Extended Data Table 2 | Summary of proteins, chromophores and model validation 


a 
Chromophore 
Subunit Numbers PCB PEB PUB 
in PBS Per Per Per Total 
subunit 7%! subunit subunit Tl 
are 20 1 20 20 
perc 2 1 2 2 
ApeD 2 1 2 2 
ApeF 2 1 2 2 
are 36 1 36 36 
Pe 36 1 36 1 36 2 
a 254 2 508 508 
pre 274 3 822 822 
Le 2 
Lew 2 1 2 2 
Lxcl 6 
Lrc2 2 
Lac3 2 
Lacd 2 
LrcS 2 
Lac 2 
Lal 6 
Ly2 2 
Lp3 2 
Ley4 10 3 30 2 20 50 
Lays 6 3 18 2 12 30 
1x6 4 
Ley7 4 3 12 2 8 20 
Ley8 4 1 4 2 8 2 
19 2 
Total 706 120 1430 48 1598 
b 
sfoteenee | MolProbity __ Ramachandran plot ait (4) RMS deviations 
orem" Sores Favored Allowed Outliers,‘ BOnds Bonds 
Length (A) Angles (°) 
Core 1.52 96.53 3.47 0.00 0.004 1.518 
Raa’ 1.75 94.43 5.57 0.00 0.009 2.149 
Rb/Rb’ 139 96.32 3.68 0.01 0.007 1.919 
Re/Re! 1.53 95.96 4.04 0.00 0.009 1.952 
Rd/Rd! 1.64 95.77 4.23 0.00 0.012 2.076 
Re/Re! 1.62 94.76 5.23 0.01 0.008 2.287 
RORE 178 93.33 6.67 0.00 0.007 1.919 
Ree’ 1.44 96.16 3.84 0.00 0.010 2.175 
HH’ 1.65 94.64 531 0.05 0.013 2.247 
MM! 1.46 9542 4.58 0.00 0.006 2.239 


‘a, Numbers of proteins and chromophores inthe PBS. b, Summary of model validation for the PBS components. “Core contains alla subunits, B subunits in core, and L:t/LcT Louflew's Lec/Lac 4, 
LcS/becS' and Lyc6/L «6 each rod (Ra/Ra'-Rg/Rq) containsall a subunits, 8 subunits and linker proteins in the rod; H/H’ contains all extra hexamers (Ha/Ha’, Hb/Hby, Ho/He! and Hd/Ha) includ 
ing alla subunits, B subunits and linker proteins in the hexamers, and ,9/L.9'; M/M' contains individual (0B) monomers and all individual 8 subunits. 
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GPRS2is aclass-A orphan G-protein-coupled receptor that is highly expressed inthe 
brain and represents a promising therapeutic target for the treatment of Huntington's 


disease and several psychiatric disorders. Pathological malfunction of GPRS2 
signalling occurs primarily through the heterotrimeric G, protein’, butitis unclear 
how GPRS2 and G, couple for signal transduction and whether a native ligand or other 
activating inputs required. Here we present the high-resolution structures of human 
GPRS2in three states:a ligand-free state, a G,-coupled self-activation state anda 
potential allosteric ligand-bound state. Together, our structures reveal that 
extracellular loop 2 occupies the orthosteric binding pocket and operates asa built-in 
agonist, conferring an intrinsically high level of basal activity to GPRS2°. A fully active 
stateis achieved when G, is coupled to GPRS2in the absence of an external agonist. 
The receptor also featuresa side pocket for ligand binding. These insights into the 
structure and function of GPR52 couldimprove our understanding of other self- 
activated GPCRs, enable the identification of endogenous and tool ligands, and guide 
drug discovery efforts that target GPR52. 


GPCRsare membrane proteins with 7 transmembrane helical domains, 
and over 800 members of this family are included in the human 
genome. Among them, more than 100 are orphan receptors-thatis, 
receptors for which the endogenous ligands have not yet been iden- 
tified". GPRS2isa class-A orphan GPCR but exhibits a low sequence 
homology (less than 20%) to non-orphan GPCRs, which hinders an 
in-depth understanding of its structure and the discovery of any tool 
ligands. GPRS2 has important roles in the brain, and is therefore an 
emerging target for the treatmentofa variety of psychiatric diseases*. 
In particular, GPRS2colocalizes with the D, dopamine receptor (D2R) 
inthe striatum of the basal gangliaand-throughitsG,couplingactiv- 
ity-can antagonize D2R signalling by causing cellular accumulation of 
cAMP®. GPRS2agonists are therefore regarded as potential therapeu- 
tics for schizophrenia’, cognitive impairment®, psychiatric disorders”, 
brain malformation’ and hyperactivity*. In addition, antagonists or 
inverse agonists of GPRS2are possible drug candidates for the treat- 
ment of Huntington’s disease, as GPR52is associated with the abnor- 
mal expression of huntingtin that is observed in patients with this 
disorder™”, 

The physiological functions of GPRS2 are all closely related to its 
coupling to the heterotrimeric G, protein and the downstream si 
nalling that results. Several structures of GPCRs in complex with 
G, have previously been determined— including the B, adrenergic 


receptor (B,AR"), adenosine A,, receptor(A,,R”), calcitonin receptor 
(CTR®), glucagon-like peptide I receptor (GLP-IR"), calcitonin gene- 
related peptide receptor (CGRP")and parathyroid hormonereceptor 
1(PTHIR)—enabling the general landscape of G, recognition by GPCRs 
to be elucidated across class-A and class-B GPCRs. All of these GPCRs 
require an agonist to induce a conformational change and form the 
agonist-GPCR-G-protein complexes. As GPRS2 is an orphan GPCR, 
the identity of its nativeligand remains unknown. However, aprevious 
study showed that GPRS2 when expressed on its own (thatis, without 
any agonist) exhibited a high level of basal activity’—reminiscent of 
the 5-HT,< serotonin receptor and several other understudied GPCRs 
that also show high levels of basal activity”. Itisunclear how this high 
level of intrinsic activity is achieved, and whether a native stimulator 
(that is, endogenous agonist) is required for G-protein coupling and 
signal transduction. 

Here we present high-resolution structures of human GPRS2in the 
ligand-freeand G,-coupled states. The structures reveal that extracel- 
lular loop 2 (ECL2) of GPR52 occupies the orthosteric ligand-binding 
pocket and has an essential rolein the self-activation of the receptor. 
Wealso reporta structure of GPRS2 in complex with a surrogate ago- 
nist, c17, and identify a ligand-binding pocket. Our results provide an 
integrated understanding of the structureand function of GPRS2and 
its mechanism of self-activation. 
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Crystal structure of ligand-free GPR52 


We first crystallized human GPRS2in the ligand-free (apo) state. The 
two apo GPRS2 structures we present (GPRS2-Rub-apo and GPRS2- 
Fla-apo) were engineered with different intracellular loop 3 (ICL3) 
fusion-protein partners (rubredoxin (Rub) and flavodoxin (Fla)) and 
crystallized in different space groups. Tofurther improve the stability 
and surface expression of GPRS2, weintroduced 7 point mutations to 
the wild-type sequence and removed 16 and 21 residues fromthe Nand 
Ctermini, respectively (Methods, Extended Data Figs.1,2). Thesemodi- 
fications can increase the yield of the protein, as well as its stability— 
both of which are crucial for the determination of the high-resolution 
crystal structures (Extended Data Table 1). We found that the two apo 
structuresare essentially identical in their overall conformation at the 
transmembrane region, with a root-mean-square deviation (r.m.s.d.) 
of 1.1A for the Ca.atoms of the helix bundles (Fig. 1a, Extended Data 
Fig. 1). This suggests that theconformation ofthe receptorisnotaltered 
by crystal packing. In particular, the ECL2 region is well-folded in both 
structures and highly consistent (r.m.s.d. of 0.8A). Hereafter, wedo not 
differentiate between these two structuresand refer only to GPRS2-apo 
unless otherwise noted. The cytoplasmic portion of GPRS2-apo adopts 
an inactive conformation that is characterized by a lack of outward 
movement of transmembrane helix 6 (TM6)~due in partto the absence 
of binding to intracellular G proteins or other partner molecules. 


ECL2 occupies the orthosteric pocket 


Closer examination of the GPRS2-apo structure revealsa feature thatis 
absent inall of the other GPCR structures that are currently available. 
GPRS2hasa22-residue ECL2that foldsintoasmall moduleand occupies 
the orthosteric binding pocket of the receptor, establishing interac- 
tions with transmembrane helices to maintain its configuration (Fig.1b, 
c). Structurally, ECL2 can be further divided intotwo segments. The first 
segment contains residues 179-181 and 191-200, which together func- 
tionasalidto cover the packet from the top of the receptor (Fig. b,c). 
Thesecond segment, inthe middle of ECL2 (residues 182-190), adopts 
anextended conformation with ashort3,,helix; this segment fitsinto 
the large hydrophobic cavity in the orthosteric binding pocket and is 
surrounded bya panel of residues in the core of the helix (Fig. 1c). These 
residues are topologically equivalent in nearly all currently available 
class-A GPCR structures and mediate consensus contact with ligands”, 
Structural superposition of this second segment of ECL2 with cor- 
responding segments from other ligand-bound GPCRs-such as the 
apelin receptor (APJR™) in complex with the apelin peptide analogue 
AMG3054, and 5-HT,,R” in complex with the serotonin receptor agonist 
LSD-shows the overlay of aconsensus ligand-binding pocket (Fig. 1b). 
In particular, on one side of this pocket, the side chain of Tyr185* 
packs tightly into a local aromatic environment formed by the resi- 
dues Tyr281*, Tyr284°* and Phe285° of TM6 (Fig. 1c) (superscript 
numbers use the Ballersteros and Weinstein numbering system”). In 
addition, thereisa saltbridge between LysI82“ and Asp188*", which 
appears to be a strong conformational constraint that stabilizes the 
second ECL2 motifin the pocket (Fig. 1c). Notably, there is a disulfide 
bond between Cys193"™ inthelid and Cysl14*in TM3, which fixesthe 
conformation of the lid and in turn maintains the second ECL2 motif 
inthe pocket (Fig. 1a). This disulfide bond ishighly conserved, and has 
akey role in stabilizing different agonists in the orthosteric binding 
pocket of other class-A GPCRs". The structural parallelism strongly 
suggests that the second segment of ECL2 may behave as a built-in 
‘agonist’ for activating the receptor in the orthosteric binding pocket 
(Fig. 1b, Extended Data Fig. 3a). Consistent with this hypothesis, previ- 
ousstudies have shown that GPRS2exhibitshigh constitutive activity’. 
Incontrast to APJRand5-HT,,R, the first half of ECL2 (residues 182-190) 
of GPRS2isburied in the pocket, and the secondhalf (residues 191-199) 
protrudes to the extracellular surface—an ECL2 trajectory thats the 


Fig.1| Crystal structure of GPRS2-apo and analysis of ECL2.a, The GPR52-Fla- 
apo (grey) and GPR52-Rub-apo (blue) structures are overlaid toshow the 
canonical seven-transmembrane-helix topology of GPRS2. ECL2isshownin 
red (inRub-apo) nd grey (in Fla-apo). Disulfide bonds areshownas yellow 
sticks. H8, helix 8.b, Superposition of the ECL2ALM (yellow) with 
representative class-A ligands: AMG3054 in APJR (RCSB Protein Data Bank 
(PDB)SVBL, purple)and LSDin5-HT»R (PDBSTVN, green) inthe GPRS2 pocket. 
The pocketis shownasa semitransparent surface. ¢, Magnified viewof the 
ECL2ALM inthe orthosteric binding pocket. Thelid and ALM of ECL2 are in pink 
and yellow cartoon representation, respectively. Key interacting residues are 
shownas sticks. The salt bridge between Lysi82'“*and Asp188"““isshownasa 
dashed line. d, Superposition of ECL2in GPRS2 (blue), APIR (purple) and 
S-HTzuR (green) in side view and top view. The ECL2 trajectories (from TM4 to 
‘TMS)are indicated with arrows. e, Mutations that interferewith the 
conformation of the ECL2ALMin the orthosteric binding pocket abolish 
downstream signalling in a cAMP assay. The cAMP response level was 
compared between wild-type (WT) GPRS2and various mutant versions 

{in 4182-190 (GS) and 191-199(G5), residues 182-190 and 191-199, respectively, 
were replaced witha six-residue linker (GGSGGS)). Significance was 
determined by two-way analysis of variance (ANOVA) without repeated 
measures, followed by Dunnett’s post hoctest ***P<0.0001).Dataare 

mean s.e.m.(n=3). 


complete reverse of that observed in all GPCRs that have previously 
been described (Fig. 1d). Itis also noteworthy thata conserved sodium- 
binding site is not presentin the GPRS2 structure, nor isa sodium ion 
observed in the orthosteric pocket—consistent with the high level of 
constitutive activity that is exhibited by the receptor”. 

Mutagenesis and cellular functional assays showed that deleting 
residues 182-198, replacing residues 182-190 or 191-199 witha 6-resi- 
due linker (GGSGGS), breaking the disulfidebond between Cys193°? 
and CysI14™ or even mutating the single key residue Lys182*" all 
markedly reduced the signalling activity of GPRS2 (Fig. le). Despite 
these modifications, however, the levels of protein expression and 
membrane trafficking were maintained at 30-50% compared to the 
wild type (Extended Data Fig. 2d). This result confirms our hypothesis 
that the uniquely folded ECL2 motif hasa key role in stimulating the 
intrinsic activity of GPR52. We refer to this new motif hereafteras the 
agonist-like motif (ALM) (Fig. 1c). 


Cryo-EM structure of GPR52-mini-G, 

Consistent with the intrinsic activity of GPR52 and confirming our 
hypothesis of the ECL2 ALM, we were able to forma stable GPR52-G- 
protein complex in vitro in the absence of an agonist. We purified a 
version of GPRS2 that contained two stabilizing mutations (AI30“"W 
and C314°P), an N-terminal BRIL fusion protein (Methods) and an 
intact ICL3, and mixed itwith the G, subunit of minimal G, (mini-G)"**, 
Gy, anda camelid antibody Nb35 that binds at the G,.-G, interface”. 


Nature | Vol579 | 5March 2020 | 153 


Article 


Hors: 


Fig.2|Cryo-EMstructure of GPRS2bound toheterotrimericmini-G, inthe 
absence ofanagonist.a, The activity of GPRS2in the absence or presence of 
the cl7 agonist was monitored according to cAMP response level. Data are 
‘means.e.m(n=3).b, Two orthogonal views of the cryo-EM density map of the 
GPRS2-mini-G,-Nb35 complex, colour-coded by protein.¢, Ribbon diagram of 


Analysis with size-exclusion chromatography revealed that GPRS2 
formed amonodispersed complex with mini-G,and Nb3Sin the absence 
ofan agonist; this complex remained stable at 4 °C for at least 10 days 
(Extended Data Fig. 4a). Two-dimensional classification analysis of 
cryo-electron microscopy (cryo-EM) data revealed averages with a 
clearshapethatis typical of GPCR-G-protein complexes, again indicat- 
ing the formation of a stable complex (Extended Data Fig. 4c). Next, 
we monitored the cellular activity of wild-type GPRS2in the absence 
or presence of c17, asurrogate agonist®. In accordance with previ- 
ous studies’, GPR52 showed a very high level of basal activity even in 
the absence of cl7 (around 90% of the level of activity when cl7 was 
present) (Fig. 2a). These results confirm our hypothesis that—unlike 
most of the GPCRs that have been studied so far-GPRS2can couple 
toheterotrimericG, and activate the downstream signalling pathway 
without stimulation from an agonist. 

‘Tounderstand the structural basis of the intrinsicactivity of GPRS2 
and how this relates to G-protein coupling, we determined the cryo- 
EMstructure of the GPRS2-mini-G,-Nb35 complex atanominal global 
resolution of 3.3 A and at around 3A resolution at the interface between 
GPRS2and mini-G, (Fig. 2b, Extended Data Figs. 4, 5). The electron 
microscopy density map allowed usto unambiguously trace the poly- 
peptide chains and build the atomic structure of the entire complex 
(Extended Data Table 2, Extended Data Fig. 6). The overall conforma- 
tion of the GPRS2-mini-G,-Nb35 complex (hereafter referred to as 
GPRS2-mini-G,)is consistent with previously reported structures of 
class-A active GPCR-G-protein complexes", More specifically, the 
cytoplasmic region sits on the canonical surface formed by G,,and Gp, 
and Nb3Sstabilizes the complexasa wedge between G,,and Gy (Fig. 2c). 

The GPRS2-mini-G, interface consists of TM3 and TMS-TM7 of 
GPRS2, and the Ras-like GTPase domain of G,, (Fig. 2d). The cytoplas- 
mic ends of TM2-TM3 and TMS-TM7 form a deep cavity that accom- 
modates the aShelix of G,, through both hydrophobicand electrostatic 
interactions (Fig. 2d, Extended Data Fig. 7a). The aliphatic side chain 
of the conserved Argi39"* from TM3 of the receptor stacks with the 
‘Tyr391 aromatic ring of G,,, anchoring the C terminus of the aS-helix of 
G,,in the centre of the cavity (Fig. 2d). TMS hasa very longcytoplasmic 
extension that not only forms part of the a5-helix-binding cavity, but 
also coversa substantial area of the surface of one side of G,, (Fig. 2d). 
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the GPR52-mini-G,-Nb35 complex in the same view asb. d, Two perspectives of 
the cavity formed by the cytoplasmic ends of TM2-TM3 and TMS-TM7, which 
accommodatesthe a5-helix of G,, through both hydrophobicandelectrostatic 
interactions. e, Magnified view of the interface between CLL of GPRS2and 
blade7 of G,. Key residues at the interfaceare shownassticks. 


Onthe opposite side of theas-helix-binding cavity, a shorthelixinICL2 
of GPRS2 fits in reverse into a hydrophobic pocket that is formed by 
helices aN and aS as well as part of the central B sheet of Gy. (Fig. 2d, 
Extended Data Fig. 7b). 

Compared to the structures of other GPCR-G-protein com- 
plexes'" 2", the interface of GPRS2 with mini-G, also involves direct 
interactions between G, and ICLI (Fig. 2e). The hydrophobic coiled 
ICLI sits ina shallow groove that is formed by the seventh blade of 
the G, propeller, burying an exposed surface area of around 360 A? 
(Fig. 2e). The electron microscopy density map shows that Pro69' 
of GPRS2isin close proximity to, and therefore probably stacks with, 
the Phe335 aromatic side chain of G, (Fig. 2e). Atthe periphery of the 
pocket, multiple charged residues of G, are in close proximity to ICLI, 
suggesting that additional electrostatic contacts also contribute to 
the interaction between G, and GPRS2 (Fig. 2e). In addition, the side 
chain of Arg38 from the C-terminal end of the aN helix of G,,sticks out 
and mediates electrostatic interactions with the backbone of GPR52, 
which helps to anchor ICL1in the G, groove (Fig. 2e). Together, these 
results show that GPRS2 mediates extensive interactions with the mini- 
G, trimer and buries an exposed surface area of around 1,790 A?—an 
areasubstantially larger than that observed inmost GPCR-G complex 
structures" **-® (Fig. 2e, Extended Data Fig. 7c-h). 


From apo to mini-G,-coupled states 

Acomparison of the apo and the mini-G,-coupled structures of GPR52 
enables us to examine the conformational changes thatare associated 
with complex formation. In the GPRS2-apo structure, the cytoplasmic 
portion of thelong TMS wasmodified to facilitate the crystallization of 
the protein (Fig. 3a). Structural overlay reveals an outward movement 
of TM6 of about A (based on Caof Val266°*) anda seesaw winding of 
‘TMS (Fig. 3b) in the mini-G,-coupled relative to the apo structure. The 
latteris characterized by an N-terminal inward movement towards TM4 
and C-terminal outward movement towards TM6in mini-G,-coupled 
GPRS2 (Fig. 3b). These movements inducea rotamer change of the key 
aminoacid Argi39™, pointing it towards the transmembrane core; as 
aresult, TM3 shifts upwards by around 1.5 A (based on Ca of Argi39**°) 
and Argi39*°stacks onthe C terminusoftheG,,a5helixin the complex 


LOX oytpiasmic view 


Fig.3|Structuralcomparison of apoand mini-G,-coupled GPRS2. a, Side 
view of the overlaid mini-G,-coupled (green) and apo (blue) GPRS2 structures. 
, Cytoplasmic view of thestructural comparison reveals an outward 
movement of TM6 of around 6 A inthe GPRS2-mini-G, relative to the GPRS2. 
apo structure, ¢, Two magnified orthogonal views of the conformational 
changesin R139™° and Y317", as well as the peripheral transmembrane 
helices, between the mini-G,-coupled (green) andapo (blue) GPRS2 structures. 
D138™”, RI39*° and Y317"are shown as sticks. d, Superposition of ECL2 from 
the GPRS2-apo crystal structure (blue) and GPRS2-mini-G, cryo-EMstructure 
(green), showing the conformational consistency between the structures. 


(Fig. 3c). The‘ionic lock’ between residues Arg**” and Glu’"or Asp isa 
conserved conformational signature ofallinactiveGPCRs® *, Although 
‘Asp260™ is not visible in the crystal structure of GPRS2-apo, the salt 
bridge between Argi3> and Asp138™” indicatesthat GPRS2-apoisnot 
inanactive conformation (Fig. 3c). In addition, during the transition 
from the apoto the mini-G,-coupled state, the side chain of another 
conserved residue, Tyr317”®, rotatesintothe core of the helix bundleto 
lock TM6 into theactive position for G-protein binding (Fig. 3c). Ofnote, 
the Cterminus of TM6 ispartially disordered inthe electronmicroscopy 
map, suggesting that this region isstructurally dynamic upon G-protein 
binding. Together, these structural rearrangements open space in the 
helical coreatthe cytoplasmic region of GPRS2, generatingacrevice of 
asufficient size to accommodate the C terminus of the aShelix of G,.,. 

In contrast to the cytoplasmic region, the extracellularend of GPRS2 
that forms the canonical orthosteric binding pocket does not exhibit 
marked changes between theapoand the mini-G,-coupled GPRS2struc- 
tures (Fig. 4f). In particular, the conformation of the ECL2 region is 
highly conserved, with a Ca atom r.m.s.d. of 2A (Fig. 3d, Extended 
Data Fig. 6b). This finding suggests that the association with mini-G, 
induces major structural changes only in the cytoplasmic region of the 
receptor. Notably, this closely resembles the changes in conformation 
that occur during the transition of receptors A,,Rand5-HT,,Rfrom the 
agonist-bound inactive state to the active G-protein-coupled state?” 
(Extended Data Fig. 3c-e). In both cases, coupling with G proteins 
induces conformational changes only on the cytoplasmic side of the 
agonist-bound receptors thatare capable of transducing downstream 
signalling (Extended Data Fig. 3c, d). Comparison of GPR52to the 
AxRand 5-HT,9R structures in the agonist-bound inactive state shows 
that structural features at the activation switches~such as the DRY 
motif—are highly consistent (Extended Data Fig. 3e). Given that GPR52 
can bind the trimeric, protein and exhibits a high level of constitutive 
activity in cells without an agonist? (Fig. 2a), itis likely that the ECL2 
ALMis functionally equivalent to an agonist in other GPCRs. 


Crystal structure of ligand-bound GPR52 


The canonical orthosteric pocket in GPRS2is almost fully occupied by 
ECL2, leaving almost no space for anew ligandto bind. Wetherefore set 


outtoinvestigate the binding mode of a GPRS2agonist by cocrystalliz- 
ing GPRS2 with the surrogate ligand cl7. We used the same construct 
hereas for the crystallographic study of GPRS2-Fla-apo. The resulting 
structure was determined at 2.2A resolution. The overall conformation 
of the cl7-bound GPRS2 is highly consistent with that of GPR52-apo, 
witha Caatom r.m.s.d, of 7 A (Extended Data Fig. 1b). The agonist 
17 isin a C-shaped configuration perpendicular to the membrane 
plane, andsitsin the pocket through shape complementarity (Fig. 4a, 
b). The pocket, which is mainly formed by TM, TM2, TM7 and ECL2, 
is very close to the extracellular surface (Fig. 4b, c). This isin contrast 
toall available structures of GPCRs in complex with ligands, in which 
‘TMLisnotin direct contact with any of the ligands”. In the GPRS2-cl7 
complex, the N-terminal loop and ECL2 push the ligand towards one 
side and contributeto the formation of anew ligand pocket, which we 
refer toas the 'side pocket’to distinguish itfrom the canonical orthos- 
teric binding site. The space in the side pockets substantially limited 
by ECL2 and TM7; at the narrowest pointit is only 3.7 A wide, which 
would barely fita single layer of planar aromatic rings. 

The interactions between GPRS2 and cl7 involve hydrogen bonds, 
hydrophobic contacts and aromatic stacking (Fig. 4b). There are 
four pairs of hydrogen bonds between the ligand and the main-chain 
atoms of the receptor: the top hydroxyl group of c17 with lle189*° and 
Glul91°“; the middle hydroxyl group with Asp188"¥; and the amide 
group with Cys40™, The ligands further stabilizedin the side pocket 
through contacts with a group of hydrophobic residues (Tyr34*", 
val39™, le47”, Leul01"**, Phel17** and Thr303™). In particular, the 
aromatic ring system of cl7 forms n-1interactions with Phe300"*in 
thebottomof the pocket (Fig. 4b). Notably, molecular docking analysis 
suggests thatall reported GPRS2agonists*” probably occupy the side 
pocket ina similar manner,although with different chemical scaffolds 
(Extended Data Fig. 8a). 

The side pocket in the structures of the GPRS2-apo and GPRS2-mini- 
G,complexesis smaller but consistent overall with thatin the GPRS2- 
cl7 structure, except for the N-terminal loop, which is highly flexible 
in theapostructures without the constraint ofa bound ligand (Fig. 4c). 
Wenoted that the Trp304”” indole ringrotatesinto the bottom ofthe 
pocket core in the apo structure, reducing the size of the lower part 
of the pocket that would otherwise clash with the cl7 ligand (Fig. 4c). 
Substituting Trp304”° with an alanine residue did not influence the 
c17-induced signallingactivity (Fig. 44), consistent with the Trp304”*° 
side chain not interacting with c17in the ligand-boundstate, Atthesame 
time, the pocket residues His25 and Ser26 on the N terminus, llel89 
on ECL2 and Phell7"* on TM3 extend closer to the pocket core in the 
apo compared to the GPR52-cl7 structure (Fig. 4c), further shrink- 
ing the side pocket. Such a small pocket would bind only to ligands 
of a specific shape and size. The overall conformation of ECL2 in the 
GPR52-c17 structure, however, is consistent with that of the GPRS2-apo 
and GPR52-mini-G, structures (Fig. 4e, f), suggesting that the ECL2 
conformation is not altered by the side-pocket occupancy. 


Unique pocket and receptor conformation 

Knowledge of this small side pocket provides valuable insight in the 
search for tool compounds and an endogenous ligand for GPR52. We 
compared the side pocket of GPRS2 to that of representative peptide 
receptors, non-lipid small-molecule receptors and lipid-activated 
receptors. We found that the side pocket is markedly smaller in volume 
in GPRS2 thanin peptide receptors (1,054A? for cl7-bound GPRS2and 
958° for GPR52-apo) (see Supplementary Table2 for the comparison 
with other receptors). 

AsGPRS2is colocalized with D2Rin the basal ganglia and activation 
of GPRS2 counteracts signalling from the G,,-coupled D2R', we com- 
pared the structure of GPR52-cl7 with that of D2Rasa representative 
small-molecule receptor” (Fig. Sa). We found that whereas cl7 in GPRS2 
is located closer to TMI, TM2 and TM7, the D2R ligand (risperidone) 
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Fig. 4| Crystalstructure of GPR52-cI7 anda novel ligand-binding pocket. 
a, The structure of the GPRS2-cl7 complex (green) with ligand c17 (orange) in 
the binding pocket. ICL3 isshown asa dashed line. The F,~F-omit map (insert, 
grey mesh, contoured at 3.00) shows clear electron density forligand cl7. 

b, Key molecularinteractionsin the cl7-binding pocket. cl7 (orange) and the 
GPRS2 residues that are involvedin ligand binding (green) are shown as sticks, 
the receptor isin grey and hydrogen bondsare black dashed lines. 

¢, Comparison of the ligand-binding pocket between GPRS2-apo and GPRS2- 
17. Residuesin the apo structure (purple for Rub-apo; yellow for Fla-apo),in 
GPRS2-c17 (green) and in GPRS2-mini-G, (blue) areshown, Ligand c17is shown 


is closer to TM4-TM6~a ligand-binding mode thatis more commonly 
seenin other small-molecule’ and lipid receptors” (Fig. Sb, Supple- 
mentary Table3). The arrangement of the conserved proline residue in 
‘TMS of GPRS2 is unusual in that Pro214* overlaps with Val but not 
the equivalent Pro™° in D2R; this confersa winding mode to TMS that 
isunique to GPRS2among regular class-A GPCR structures (Fig. 5a,c). 

Astructural similarity network for all class-A GPCRs for which the 
inactive structures have been reported showed that GPRS2 was the 
only receptor witha Ca atom r.m.s.d. of more than 2.0 A based on the 
conformations of the transmembrane helices (Extended Data Fig. 8b). 
‘The unique position of Pro214**" and the resulting windingmode of TMS 
are seen inall GPRS2 structures, including the active-state GPRS2-mit 
G, complex. This configuration indicates that GPRS2 cannot use the 
consensus P**°-1**°-F** (PIF) motif™ to trigger the outward bending 
of TM6 during activation (Extended Data Fig. 3b), and thus may have 
tousea different mechanism of G-protein coupling. 

Finally, the closest homologue of GPRS2is GPR21, another orphan 
receptor. With overall sequence identity as high as 71% (Extended Data 
Fig, 8c), the two orphan receptorsare especially consistent at ECL2. 
However, the ligand-binding residues in the side pocket are less con- 
served, with only 59% identity. The less-conserved side pocket suggests 
the possibility of distinct recognition modes and thus the potential for 
development of selective modulators. 


Discussion 

‘The structure of GPRS2 that we present here will allow accurate mod- 
elling of other homologous orphan receptors. Our observation that 
ECL2 occupies the orthosteric binding pocket and pushes ligand c17 
into avery narrow cavity between ECL2 and TMI, TM2 and TM7 on 
one side suggests that c17 may function as an allosteric agonist and 
fitinto a side pocket that has not been observed in the structures of 
other GPCR-ligand complexes or allosteric modulators. Analysis of 
the shapeand the hydrophobic nature of the side pocket may provide 
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as orange sticks overlaid witha transparent sphere representation. The four 
structuresare superimposed and shownin grey cartoon.d, Assessmentofthe 
efficacy of c17 as an agonist in wild-type GPRS2 and GPRS2 with amutated 
residue inthe pocket (W304"*°A), Data are mean +s.e,m.(n=3). HTRE, 
homogeneous time-resolved fluorescence. EC.o, half maximal effective 
concentration. e, ECL2 of GPRS2-apo (cyan), GPRS2-cl7 (green) and, 
GPR52-mini-G, (blue) are superimposed. Key residuesare shownassticks. 

£ Extracellular view of superimposed GPRS2-apo (cyan), GPRS2-cl7 (green) 
and GPRS2-mini-G, (blue), showing the consistent ECL2.conformationin the 
orthostericbinding pocket. 


further insight that will aid in the deorphanization of GPRS2. Given 
the small volume and hydrophobicity of the side pocket in GPRS2, we 
propose that the endogenous ligand (ifany) that binds to this pocket 
is most likely to bea small lipid molecule, as previously predicted by 


gages 


Fig. 5| Unique ligand-binding mode and structural conformation of GPRS2 
compared to other class-A GPCRs..a, Side view of the GPRS2-c17 complex 
(green-orange) and D2R-risperidonecomplex (grey-blue). The conserved 
Pro‘ residues in GPRS2.and D2R are shown as sticks. b, Position comparison of 
c17inGPRS2with class-A small-molecule receptors, Representative agonist 
ligands from27 structures of GPCR- ligand complexes (grey sticks) are 
superimposed on the GPRS2 receptor structure (green) with c17 (orange) and 
risperidone (blue). The schematic on the right shows the binding sites of 
common ligands (grey) and c17 (orange) inside view.e, Sequence alignment of 
part of TMSin GPRS2, B,AR, S-HT,,R, APR, D2R and A,,R. Conservedresidues 
are coloured in green. The top labels indicate Ballersterosand Weinstein 
numbering”. 


phylogeny analysis". Therefore, as GPRS2is primarily expressed inthe 
brain, we suggest that future deorphanization efforts should start by 
screening small lipids that are foundin brain tissues. 

We have demonstrated that ECL2 can serve as a built-in agonist, 
leading to an active conformation of GPRS2 and a high level of basal 
signalling. This furtherhighlights the need to search for an antagonist 
orinverse agonist for research and therapeutic applications, but thus 
far only one low-efficacy antagonist has been reported”. The GPRS2 
ligand-binding side pocket we have revealed can be targeted by rational 
structure-based ligand design and holds promise for selective drug 
screening owing to itsallosteric-like features. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Molecular cloning of GPRS2 crystallization constructs 

‘The human GPRS2gene wassubcloned into the expression vector pFast- 
bacl. For GPRS2-Fla-apo, the flavodoxin" fusion protein was inserted 
into ICL3, replacing residues 236-261 of GPRS2. The construct con- 
tained residues 17-340 of GPRS2 and a 10xHis-tag at the C terminus 
that is removable by cleavage with 3C protease. Haemagglutinin (HA) 
signal peptide, Flag tag and thermostabilized Escherichia coli apocy- 
tochrome b,,.RIL (BRIL)" were added on the N terminus to enhance 
receptor expressionand were then removed during purification witha 
tobacco etch virus (TEV) site. To improve protein stability, seven point 
mutations were introduced: A130*"W (ref. *?), A264°L, W278*Q, 
C314?®°p, $318”A, N321“”D and V323*"T. Cloning of the ligand-free 
GPRS2-Rub-apo construct was performed as described for GPRS2-Fla- 
apo, except that the rubredoxin*' fusion protein wasinserted into ICL3 
(replacing residues 235-263 of GPRS2).. 


Expression and purification of GPRS2 crystallization constructs 
‘TheGPRS2proteinwas expressed and purified as previously described”. 
Weused the Bac-to-Bacbaculovirus system (Invitrogen) in Spodoptera 
frugiperda (Sf9) cells for expression. These cells were infected with 
baculovirus ata density of 210° cells per ml. Cells were grown at27 °C 
and collected 48h after infection. 

Forcocrystallization with ligand c17, the cell membrane was washed 
witha low-salt buffer (10 mM HEPES (pH7.5), 20 mMKCI,10mM MgCl, 
and protease inhibitor cocktail (Roche)) and threetimes with ahigh-salt 
buffer (10 mM HEPES (pH7.5), 1M NaCl, 20 mM KCI, 10 mM MgCl, and 
protease inhibitor cocktail). Before solubilization, purified membranes 
were incubated with 20 uM cl? for 3h, then incubated with 2 mgmI" 
iodoacetamide (Sigma) at 4°C for 1h. Theprotein was extracted from 
the membrane by 100 mM HEPES, 800 mM NaCl, 1.0% (w/v) n-dodecyl- 
-D-maltopyranoside (DDM) (Anatrace) and 0.2% (w/v) cholesteryl 
hemisuccinate (CHS) (Sigma) and stirred for 2.5 hat 4 °C. After cen- 
trifugation, the supernatant was incubated with TALON IMAC resin 
(Clontech) at 4 °C overnight. Then the resin was washed with 15 column 
volumes of buffer I: 50 mM HEPES (pH7.5), 800 mM NaCl, 5% (v/v) glyc- 
rol, 0.05% (w/v) DDM, 0.01% (w/v) CHS, 10 mM MgCl,,20mM imidazole 
and 20 uM cl7. The resin was resuspended with 3 column volumes of 
buffer I:25 mM HEPES (pH 7.5),800 mM NaCl, 5% (v/v) glycerol, 0.03% 
(w/v) DDM, 0.006% (w/v) CHS, 40 mM imidazole and 20 iM cl7. TEV 
protease was added with a molar ratio of 1:10, and the mixture was 
incubated at 4 °C overnight. Next, theresin was washed with 8 column 
volumes of buffer Il to remove the HA, Flag tag and BRIL. The protein 
was eluted with 3 column volumes of buffer Ill: 25 mM HEPES (pH 7.5), 
800 mM NaCl, 5% (v/v) glycerol, 0.01% (w/v) DDM, 0.002% (w/v) CHS, 
220 mM imidazole and 20 uM cl7. The ligand-free GPRS2 protein was 
purified as described above, except thatligand cl7 was notadded dur- 
ing the purification process. 


Crystallization 

The receptor was concentrated to around 50 mgm!" with a100-kDa 
cut-off concentrator (Millipore). The protein sample was reconstituted 
into a lipidic cubic phase by mixing 40% protein with 60% lipid (10% 
(w/w) cholesterol and 90% (w/w) monoolein) in a syringe mixer. The 
crystallization trials were set up by crystallization robot NTS (Formu- 
latrix). The protein and lipid mixture was dispensed in 40-nl volumes 
on 96-well glass sandwich platesand overlaid with 800 nl of precipitant 
solution in each well. For the GPRS2-cl7 complex, crystals appeared 
after] day in 0.13-0.18 M sodium acetate, 0.1M sodium citrate pH 5.0 


and 32-35% PEG400, and reached full size (40 x 40 x 50 ym’) after 
9 days. 

For GPRS2-Fla-apo, crystals appeared after day in 0.1M potassium 
acetate, 0.1M sodium citrate pH 5.0 and 30% PEG400, and reached 
full size (15 «15 x 80 ym’) after S days. For GPRS2-Rub-apo, crystals 
appeared after 1 day in 0.08-0.1M magnesium sulfate, 0.1Msodium 
cacodylate trihydrate pH 6.2and 28-31% PEG300, and reached full size 
(120 x60 «10 ym?) after Iweek. 


Crystal data collection and structure determination 

The X-ray diffraction data were collected at SPring-8 beamline 41XU, 
Hyogo, Japan, using an EIGER X 16M detector (X-ray wavelength 
1.0000). A rastering system was used to find the best diffracting 
region of single crystals“. The crystals were exposed with a10(v) x9 (h)- 
1m minibeam for 0.2 sand 0.2° oscillation per frame. All three data- 
sets (GPRS2-Fla-apo, GPRS2-Rub-apo and GPRS2-c17) were processed 
usingHKL2000*. For GPRS2-Fla-apo,amolecular replacementmethod 
with Phaser‘ was used to obtain the initial phase information, using 
the structures of ahomologous model of human GPRS2 generated 
by Rosetta’ and flavodoxin (PDB 1110) as search models. The initial 
model and map were subjected to PHENIX AutoBuild using all data 
including weak reflections of lower than 2.9 A resolution. Most of the 
transmembrane helices of GPRS2 were traced. Refinement was per- 
formed with PHENIX and REFMACS®, followed by manual examination 
and rebuilding of therefined coordinatesin Coot® using both 2F, ~ F. 
and F,~F.maps. Both the GPRS2-Fla-apo and GPRS2-Rub-apo datasets 
showed stronganisotropy and were truncated on the STARANISO server 
(http://staraniso.globalphasing.org) using the diffraction CC,,.cut-off 
criterion of 0.10. The GPRS2-Rub-apo structure was solved by Phaser 
with the truncated model of GPRS2-Fla-apo by deleting the flavodoxin 
part. The rubredoxin molecule taken from PDB entry 6BD4 served as 
asecond searching model by Phaser to generatea complete model of 
GPRS2-Rub-apo. The GPRS2-Fla-cl7 structure wassolved by molecular 
replacement with GPRS2-Fla-apoas the searching model after remov- 
ing all water and other molecules. The model and restraints of cI7 were 
created using the Ligand Builder in Coot and placed into the electron 
density mapin the orthosteric binding pocket. The Ramachandran plot 
determined by MolProbity* indicates that 96.1% (3.9%) of residues in 
GPRS2-Fla-apo, 98.3% (1.7%) of residues in GPRS2-Rub-apo and 98.2% 
(1.8%) of residues in GPRS2-c17 werein favoured (allowed) regions. Data 
collection and structure refinement statistics are listed in Extended 
Data Table1. 


Purification and formation of the GPRS2-mini-G,-Nb35 
complex 
For cryo-EM research, the construct of human GPRS2 (residuesl-340) 
was designed with two point mutations (AI30™"'W and C314”"°P) andan 
N-terminal BRIL fusion protein. Uptothe pointofthe buffer Ilwash, the 
purification process was the same as for ligand-free GPRS2. After the 
resin was washed with 8 column volumes of buffer Il, the wash buffer 
was changed to 3 column volumes of exchange buffer: 50 mM HEPES 
(pH7.5), 500 mM NaCl, 10% (v/v) glycerol, 0.5% (w/v) lauryl maltose 
neopentyl glycol (LMNG) (Anatrace), 0.1% (w/v) CHS, 10 mM MgCl and 
20mMimidazoleand incubated for 4 hat 4°C before HRV3C protease 
(witha ratio 1:20) was added to cleavethe 10*His-tag. The flow-through 
was collected and 3 column volumes of exchange buffer were added 
and collected. The protein solution was concentrated to a volume of 
0.5mland loaded onto a Superdex 200 10/300 column (GE Healthcare) 
equilibrated with 20 mMHEPES (pH 7.5), 100 mM NaCl, 0.00075% (w/v) 
LMNG and 0.00015% (w/v) CHS. Peak fractions correspondingto GPRS2 
were pooled and concentrated to 20 mgmt". 
TheG,,subunit of mini-G, (mini-G,,) used in this study was thesame 
as that used ina previous study of the cryo-EM structure of A,,R-mini- 
G,-Nb35”. Inbrief, mini-G,,. was expressedin the£. colistrain BL21.and 
purified by Ni* affinity chromatography, followed by cleavage of the 


His-tag using TEV protease and negative purification on Ni*-NTA aga- 
rose to removethe TEV and undigested mini-G,.,. The Gy, and Nb35 puri- 
ficationwas performed following previously described protocols?™. 

Heterodimeric Gp, and mini-G,, were mixed in a1:1.2 ratio and incu- 
bated once for 4 hto form heterotrimeric mini-G,. Excess mini-G,,was 
removed by size-exclusion chromatography on aSuperdex 20010/300 
column witha running buffer of 20 mM HEPES (pH 7.5), 100mM NaCl, 
10% glycerol, 1 mM MgCl, 1 1M GDP and 0.1 mM Tris(2-carboxyethyl) 
phosphine (TCEP). 

Purified GPRS2 was mixed with a 1.2-fold molar excess of hetero- 
trimeric mini-G, and 15-fold molar excess of Nb3S in the presence of 
apyrase (0.2U ml"). The mixture was incubated on ice overnight. The 
sample was loaded on to a Superdex 200 10/300 column. Peak frac- 
tions containing the GPRS2-mini-G,-Nb35 complex were pooled and 
concentrated to4 mg mI". 


Preparation of vitrified sample 

A3+I droplet of purified sample at a concentration of around 2.5 mg 
ml"'was applied to glow-discharged holy carbon grids (Quantifoil, 200 
mesh copper R1.2/1.3). Excess sample was removed by blotting with 
filter paper for 3.5 s before plunge-freezing in liquid ethane usinga 
FEI Vitrobot Mark IV at 100% humidity and 4 °C. 


Cryo-EM data acquisition 

Images were collected on a FEI Titan Krios microscope at 300 kV using 
a Falcon Ill detector in electron-counting mode. EPU software (FEI) 
was used for automatic data collection. Data were collected in three 
independent sessions to give a total of 7,287 movies. Each stack was 
exposed for 53 s with a dose of 0.95 € per pixel per second, resulting 
ina total of 32 frames per stack. The total dose rate was around 40 
 A*for each stack. 


Cryo-EM data processing and model building 

Drift correction and dose weighting were performed onthe 7,287 mov- 
ies with MotionCor2®.The contrast transfer function (CTF) parameters 
wereestimated by Gctfandall three-dimensional (3D) reconstructions 
were performed with RELION-3*. Around 2,000 particles were manually 
picked to generate the templates for automatic picking of particles. 
After automatic picking, 8,861,544 particles were extracted usinga box 
size of 256 pixels. Next, False positives or bad particles’ were eliminated 
over two rounds of reference -freetwo-dimensional (2D) classification. 
Atotal of 4,849,328 particles were finally selected and subjected toa 
global angular search 3D classification using a 60 A low-pass-filtered 
initial model thatwas generated fromthe A,,R-G, complex (PDB 6GDG) 
with I class and 25 iterations. The outputs of the 2ist to 2sth iterations 
were subjected to local angular search 3D classification with separate 
classes. After merging of all good classes and removal of duplicated 
particles, 1,551,107 particles were subjected to 3D autorefinement. Then 
the particles wereused for further 3D classification with no alignment, 
and 651,456 particles were selected for further autorefinement. After 
CTF refinementand Bayesian polishing, final 3D refinement resulted in 
an overall structure at 3.32-A resolution. All resolutions were based on 
the gold standard (two halves of data refined independently) Fourier 
shell correlation (FSC) = 0.143 criterion®. RELION-3 was used to estimate 
the variations in local resolution of the density map. 

The crystal structures ofhuman GPRS2-apo (PDB 6L12) and the G-pro- 
tein complex taken fromA,,R bound to the mini-G, heterotrimer (PDB 
6GDG) were used as initial models for model rebuilding and refinement 
against the electron microscopy map. All models were docked into the 
electron microscopy density map using UCSF Chimera®, followed by 
iterative manual adjustment in Coot”, fragment-based refinement 
with Rosetta” and real-space refinement using PHENIX**. The model 
statistics were validated using MolProbity*. Structural figures were 
prepared in UCSF Chimera and PyMOL (https://pymol.org/2/).Model 
overfitting was evaluated by refinementagainstone cryo-EM half map. 


FSC curves were calculated on the basis of the final GPRS2-mini-G,- 
Nb35 model and the half map that was used for refinement, as well as 
the other half map that was used for cross-validation. 


GPRS52-G,-mediated cAMPassay 

The wild-type GPRS2 gene was subcloned in vector pcDNA3.0 with an 
N-terminal HA signal peptide and Flag tag. Mutations were introduced 
by QuickChange PCR. HEK293T cells were cultured in 1x DMEM sup- 
plemented with 10% (v/v) fetal bovine serum and incubated in 5% CO, 
at37 °C. Before thetransfection, cells were seeded on 6-cmcell-culture 
plates. When cells had grown to approximately 2 «10%, wild-type or 
mutant GPRS2 DNA was transfected into cells using Lipofectamine 
2000 reagent (Life Technologies). After a24-h culture, cells were col- 
lected and resuspended in PBS containing S00 iM IBMX at a density 
of 210° cells per ml. Cells were then plated onto 384-well assay plates 
at 1,000 cells per 5 ut! per well. Another 5 ul of buffer containing cl7 at 
various concentrations was added tothe cells, and they were incubated 
for 30 min incubation at 37°C. Intracellular cAMP measurement was 
carried with a Cisbio HTRF Dynamic 2cAMP kit (Cisbio) and an EnVi- 
sion multi-plate reader according to the manufacturer's instructions*. 
The HTRF ratio was converted to a response (%s) using the following 
formula: response(s) = (FatiO,smpte FAtiOpa)/(FatiOmg~ FAtIO py) X10. 
Cell-surface expression for each mutant was monitored by a fluores- 
cence-activated cell sorting (FACS) assay. In brief, the expressed cells 
were incubated with mouse anti-Flag (M2-fluorescein isothiocyanate 
(FITC) antibody (Sigma) for 20 min at 4 °C, and then a 9-fold excess 
of PBS was added to cells. Finally, the surface expression of GPRS2was 
monitored by detecting the fluorescent intensity of FITCusing aGuava 
EasyCyteHT system (Millipore). 


Construction of structural similarity network 

Class-A GPCRs for which the inactive structure has been reported 
(Supplementary Table 4) were used to construct the similarity net- 
work, For receptors with more than one reported inactive structure, 
only one structure was selected (receptor state, completeness and 
resolution were considered). The r.m.s.d. values between C, atoms 
in the transmembrane helices were calculated using UCSF Chimera 
for edge attributes. In total, 192 residues were included: Ballersteros 
and Weinstein numbers 1.35-S7, 2.37-63,3.22-56, 4.39-63, 5.36-65, 
6.33-59 and 7.31-5S. The visualization of the network was generated 
with Cytoscape software”. 


Structure and sequence comparison 

The calculation of the pocket volume was performed with the program 
Cavity®. The representation of the GPRS2 and GPR21 sequence align- 
‘ment was generated using the ESPriptwebsite* (http://espript.ibcp.fr). 


Molecular docking 

Molecular docking was performed using Schrédinger software. 
Processing of the protein structure was performed with the Protein 
Preparation Wizard (https://www.schrodinger.com/protein-prepa- 
ration-wizard); conversion of ligands from 2D to 3D structures was 
performed using LigPrep (https://www.schrodinger.com/ligprep); 
and docking was performed with Glide 6.9 (https://www.schrodinger. 
com/glide) in standard precision. The cartoons of all structures were 
generated by PyMOL®. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 


‘The coordinates and structure factors for GPR52-Fla-apo, GPR52-Rub- 
apo, GPRS2-Fla-cl7 and GPRS2-mini-G,-Nb35 havebeen deposited in 
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the PDB with accession codes 6LII, 6L12, 6L10 and 6L13, respectively. 
The cryo-EM3D maps of the GPRS2-mini-G,-Nb35 complex have been 
deposited in the Electron Microscopy Data Bank (EMDB) withaccession 
codeEMD-0902. All other data relatingto this study are available from 
the corresponding authors on reasonable request. 
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Extended DataFig.1|Engineering andcrystallization of GPRS2.a, Schematic andGPRS2-cl7 structures showsthe overall conserved helical arrangement. 


of the GPRS2 constructs that were used for crystallization (residues17~340). _¢, Crystalimagesof the GPRS2-c17 complex (left), GPR52-Rub-apo (middle) and 
Thermostabilizing mutations (red) areA130W, A264L, W278Q,C314P,S318A, __ GPR52-Fla-apo (right). Experiments were repeated three times with similar 
1N321Dand V323T. Cysteine residues (yellow), TEV cleavage site (pink) and results. d, ECL2. comparison of two apo structures (GPRS2-Fla-apo, orange; 
disulfide bonds (orangedashedlines) areshown.b, Left, analytical size- GPRS2-Rub-apo, cyan). The crystal packing of GPRS2-Fla-apo (middle) and 


exclusion chromatography of GPR52. Experiments were repeatedthreetimes _ GPRS2-Rub-apo (right) isalso shown. Helix 8 of GPRS2-Rub-apois highlighted 
withsimilar results. Right, superposition of GPRS2-Fla-apo,GPRS2-Rub-apo _inred. 
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C Functional potency (pEC50) 
WT A130W A264L W278Q C314P S318A N321D V323T 


17 7.5140.23  6.8640.15 7.2940.10 6.82+0.25 6.73+0.09 7.394016 7.114015 7.98+0.20 


ICL3-Rub ICL3-Fla | GPR52-Rub-apo GPR52-Fla-apo Cryo-EM construct K182E 
c17 NA NA NA NA 6.59+0.09 6.19+0.19 


C193A AECL2(182-198) \182-190(GS) 4191-199(GS) HEK293T 
ci7 7.55+0.15 NA NA 6.08+0.05 NA 


di Relative surface expression (%) 
WT A1S0W = A264L_—Ss«W278Q)C314P— S318ANS21D ss V323T 


Expression 100#12.9 105.9412.4 77.345.5 87.3414 100#20.5 88.6417.7 72.348.5 54.5410.9 


ICL3-Rub ICL3-Fla GPR52-Rub-apo GPR52-Fla-apo Cryo-EM construct K182E 
Expression 66.846.2 70.0+1.6 102.3+6.8 96.4414.7 78.6415.0 36.346.3 


C193A AECL2(182-198) A182-190(GS) A191-199(1GS) HEK293T 


Expression 48.024.6 69.6411.0 30.5+3.7 38.2+10.1 NA 
Extended Data Fig. 2| Effects of GPRS2 mutations on the potency of c17. shown in orange. ¢, Summary of functional potency (pEC,,) values of 17 onthe 
a, Basalactivity of GPRS2 mutants, Response-level values were compared with GPRS2mutants, Dataaremean-+s.e.m. (n=3).d, Relative surface expression 
wild-type GPRS2by two-way ANOVA withoutrepeated measures, followed by _ levels of mutant constructs were monitored by a FACS staining assay (Methods) 
Dunnett's posthoc test ("***P<0.0001).Dataare mean +s.e.m. (n=3)b, and normalized to the expression evelsof wild-type GPRS2. Dataare 


Mapping of mutated residues (green) onto GPRS2crystal structures.cl7is mean-s.e.m. (1=3).NA,notavailable. 
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Extended Data Fig.3| Comparison of GPRS2 with other 
a, Comparison of the ECL2 ALM-occupied orthosteric binding pocket of GPRS2 
withagonist-bound pockets of A,R (PDB 6D9H), A,,R (PDB 6GDG)and S-HT,R 
(PDB 6679). b, PIF motif comparison of mini-G,-coupled GPRS2 (green) and 
A,,R (yellow; PDB 6GDG).c, Side view (left), cytoplasmic view (middle) and 
extracellular view (right) of A,,R in the mini-G,-coupledstate (green, PDB 
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‘6GDG) compared with the adenosine analogue (NECA; PDB2YDV)-bound state 
(pink). d, Side view (left), cytoplasmic view (middle) and extracellular view 
(right) of S-HT,,Rin the mini-G, coupled state (in complex with donitriptan) 
(green; PDB6G79) compared with the ergotamine-bound state (pink; PDB 
4IAR).€, DRY motif of GPRS2-apo, 5-HT,,R-ergotamine (PDB 4IAR) and 
A.aR-NECA (PDB 2YDV). 
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Extended Data Fig. 4 | Cryo-EM analysis of the GPRS2-mini-G,-Nb3S 
complex.a,b, Size-exclusion chromatography profile (a) and corresponding 
SDS-PAGE gel (b) of the purified GPR52-mini-G,-Nb3S complex. Experiments 
were repeated three times with similarresults.c, Representativereference-free 
2D cryo-EMaverage of the GPRS2-mini-G,-Nb35 complex.d, GPRS2 with point 
mutations A130W™'and C314” maintained around SO% of the activity 
relative to the wild-type protein, according tothe cAMP responselevel. Data 
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aremean +5.e.m, (n=3).e, Representative cryo-EM micrograph of the 
GPR52-mini-G,-Nb35 complex. Reference-free 2D averages ofthe 
GPRS2-mini-G,-Nb35 complex.g, Final 3D density map coloured according 
tothe local resolution. h, Gold-standard FSC curves, showing the overall 
nominal resolution at3.3 A.i, Angulardistribution of the particlesused for 
the final reconstruction of the GPRS2-mini-G,-Nb35 complex. 
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Extended Data Fig. 5| Flow chart for the cryo-EM data processing and structure determination of the GPRS2-mini-G,-Nb35complex.See Methods for details. 
The final reconstruction has anaverage resolution of 3.3 A. Allthe images in this figure were created in UCSF Chimera, 
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Extended Data Fig. 6 Cryo-EM map quality and ECL2comparison. a, Atomic mapasmesh.b, Stereo views of theelectron density maps of ECL2. eft, the 
‘model of GPRS2transmembrane helices, ECL2andICL2inthecryo-EMdensity _ 2F,~Fmap of ECL2 from the GPRS2-Rub-apo crystal structure. Right, the 
‘map. The molecular modelisshown instickrepresentationand the cryo-EM _ electron density map of ECL2 from the GPRS2-mini-G, complex structure. 
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A,,R-miniGs 5-HT,,.R-miniGo 
GPRS2 | B,AR | A,,R | GLP-IR | 5-HT,,R 
Interface (A?) | 1,790 | 1,520 | 4,400 | 1,270 | 920 


Extended Data Fig. 7 | Comparison of the GPRS2-mini-G, interface with that 
‘protein complexes.a, b, Front view (a) and back view (b) of 
the GPRS2-mini-G, interface, GPRS2 (centre)andmini-G, (right) are in surface 
representation and coloured according to the electrostatic potential (blue, 
positive; red, negative).c, The GPRS2-mini-G, interface. GPRS2and mini-G,are 


of other GPCR- 


in cartoon representationand coloured in greenand grey, respectively. A 
‘magnified view of the interface is shown on the right in surface representation. 
d-g, Magnified views of the interface between other receptors.and proteins 
in surface representation.h, Buried surface area of the interfaces between 
receptorsand G proteins, calculated by PyMOL. 
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Extended Data Fig. 8 | Docking position of GPRS2 agonists, structural 
similarity networkand sequence alignment of GPRS2., Docking position of 
GPRS2 agonists with different scaffolds: WO-459 (eft); 7m (centre) andFTBMT 
(right), The PRS2 residues that are involvedin ligand bindingareshownas 


greensticksand cl7 isshownasgrey ticks, for reference.b, Structural 
similarity network of class-A GPCRs with reported inactivestructures. 

¢, Sequence alignment of GPRS2 and GPR21 (yellow, less than 5.0 A toligand; 
greenindicates key residues for structural features). 


Extended Data Table 1| Data collection and structure refinement statistics 


Structure GPRS52-Fla-apo’ —GPRS52-Rub-apo? — GPR52-Fla-c17° 
PDB ID 6LI1 6LI2 6LI0 
Data collection 
Space group P212121 1222 P212121 
Cell dimensions 
a, b,c (A) 66.65, 79.89, 7722, 113.23, 59.97, 88.36, 
148.30 138.62 156.28 
a, B, ¥ (°) 90.0, 90.0, 90.0 90.0, 90.0, 90.0 90.0, 90.0, 90.0 
Resolution (A) 28.41-2.90 29.25-2.80 41.89-2.20 
(2.98-2.90) (2.90-2.80) (2.26-2.20) 

Ringe 0.24 (2.75) 0.16 (0.97) 0.18 (2.89) 
I/o<I> 11.8 (1.0) 13.2 (1.7) 18.3 (1.0) 
Completeness (%) 97.60 (98.80) 96.9 (81.10) 99.37 (96.11) 
Redundancy 9.2 (9.4) 8.2 (8.0) 26.5 (17.5) 
CCin 0.99 (0.40) 1.02 (0.90) 0.99 (0.44) 
Refinement 
Resolution (A) 28.41-2.90 29.25-2.80 41.89-2.20 
No. reflections 16,462 11,830 40,732 
Rwork/ Rice 0.244 / 0.267 0.241 / 0.263 0.194 / 0.220 
No. of atoms 

Protein 3,339 2761 3479 

Ligand n/a n/a 34 

Lipids and others 79 195 397 
B-factors (A?) 

Wilson / Overall 97.0 / 88.1 76.8 / 68.2 60.3 / 88.5 

Protein 88.5 67.4 80.4 

Ligand n/a nia 69.3 

Lipids and others = 71.3 80.2 106.6 
R.MLS deviations 

Bond lengths (A) 0.008 0.008 0.011 

Bond angles (°) 1.54 1.44 1.66 


20 crystals used fr structure determination, Values in parentheses are forthe highesttesolution shell 
28 crystals used fr structure determination, Values in patentheses are fo the highestresolution shell, 
57 crystals used for structure determination. Values in parentheses are forthe highest resolution shell 
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Extended Data Table 2 | Cryo-EM data collection and 


refinement stati: 


Data collectio: 
processing 
Magnification 
Voltage (kV) 
Electron exposure (e-/A*) 
Defocus range (jm) 
Pixel size (A) 
Symmetry imposed 
Initial particle images (no.) 
Final _ particle images (no.) 
Map resolution (A) 

FSC threshold 
Map resolution range (A) 


Refinement 
Initial model used (PDB code) 
Model resolution (A) 
FSC threshold 
Model resolution range (A) 
Map sharpening B factor (A*) 
Model composition 
Non-hydrogen atoms 
Protein residues 
Ligands 
B factors (A) 
Protein 
Ligand 
R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 
Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 


GPR52-miniGsBy-Nb35 
(EMDB-0902) 
(PDB 6LI3) 


75,000 
300 

40.0 

-1.0 to -2.6 
1.09 

cl 
8,861,544 
651,465 
33 

0.143 
3.0-7.7 
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and LD650asindicated. Also, we site-specifically labelled LIV-BP mutant 
DISIC, notEI8IC. The sentence “The rate of single-turnover transport 
decreased by nearly an order of magnitude when the external pH was 
increased from 6to 8 (Fig. 3c)” should read “The rate of single-turnover 
transport decreased by nearly an order of magnitude when the external 
pH was decreased from 8 to 6 (Fig. 3c)”. These errors have no effect 
ontheresults presented or the interpretations and conclusionsmade. 
These errors have been corrected online. 


E6 | Nature | Vol579 | 5 March 2020 


na Setnou/roroct ont A) 


Advice, technology and tools 


Work 


‘Send your careers story 
to: naturecareerseditor 
@nature.com 


ORY 


Research using highly pathogenic organisms often needs to be done in full protective suiting. 


BEHIND THE SCENES IN 


ABIOSAFETY 


OFFICE 


It’snever a dull day for those 


tasked with keeping 


biological research safe for all. By Kendall Powell 


avid Gillum receiveda flood of phone 
calls in January, after his university 
announced thatsomeonein the cam- 
pus community had been diagnosed 

with the COVID-19 coronavirus. 
Gillum is the director of environmental 
health and safety at Arizona State University 
in Tempe, and many of the calls were from 


biosafety officers around the globe. They 
were seeking his expertise and guidance in 
the event that their own institutions were in 
thesame situation. 

“Emergencies happen, and it’s up to youto 
be the one to keep people calm,’ says Gillum. 
But most days in the life of an institutional 
biosafety officer are much less dramatic. 
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the officers collaborate with their 
institute's scientists to work outhow to safely 
conduct biological research, such as that 
involving infectious agents or modified DNA. 
But their roles have expanded over the past 
decade, as biological research andits hazards 
have become more complex. Large research 
institutes and campuses can have dozens of 
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officers; smaller institutes often cover the role 
intheir safety offices. 

In Mexico and other Latin American coun- 
tries, the position is often given the title of 
biosafety responsible, says Luis Ochoa Car- 
rera, head of the Epidemiological Surveillance 
and Research Laboratory Network atthe Mexi- 
can Institute for Social Security in Mexico City. 

“Asa graduate student, [thought biosafety 
was that person with their clipboardto tell you 
what you were doing wrong,” says MeghanSelt- 
zer, manager of safety, health and security at 
the Howard Hughes Medical Institute's Janelia 
Research Campus in Ashburn, Virginia. “But 
there isso much more toi.” Inatypical week, 
an officer might write a risk assessment for 
aproject that would add chemically synthe- 
sized designer molecules to living cells; review 
the design and construction of tissue-culture 
room suitable for work that requires strict 
controls; or assess the risks to society from 
research that modifies the influenza virus. 
“There’s never a dull moment in our jobs,” 
says Danielle Rintala, biological-safety officer 
for the University of Wisconsin-Milwaukee. 
“There’s always something to learn about.” 


Many paths 


Few officers hold degrees relating specifi- 
cally to biosafety, and in the United States, at 
least, most qualifications tend tobe offered by 
institutes other than universities. There are, 
however, many courses that lead to accredi- 
tation asa biosafety professional, including 
the National Biosafety and Biocontainment 
Training Programat the US National Institutes 
of Health (NIH) in Bethesda, Maryland, anda 
programmed offered by the Biosafety Training 
Institute at the University of Edinburgh, UK. 
Seltzer, for instance, completed the two-year 
NIH programme, 

Biosafety officers can have backgrounds 
in life sciences, public health, medicine, 
engineering, education and emergency 
response, among others, says Seltzer, who is 
alsoa member of a careers task force for ABSA 
International, the Association for Biosafety and 
Biosecurity in Mundelein, Illinois. They tend to 
holdarange of qualifications, from undergrad- 
uate diplomas through to doctorates. Those 
inthe job recommend at least a master’s-level 
knowledge of microbiology or molecular biol- 
ogy — becausethosefields cover themostrisky 
research, Thatlevel of educationisimportant, 
they say, because the training will be used 
daily in critically evaluating complex scien- 
tificideas, anticipating potential hazards and 
executingresearch protocols. Ochoa Carrera, 
for example, trained in chemistry, pharmacy 
and biology, and hasa master’sin public-health 
management. Also important, he and others 
say, is laboratory experience working with 
infectious fungi, bacteria, viruses or parasites. 

And Toshinori Tanaka, biosafety special- 
ist for the Okinawa Institute for Science of 
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COMFORT ZONE 


Biosafety officers have standard work weeks with set hours. Most US officers eatn $50,000-110,000, 
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Technology Graduate University (OIST) in 
Japan, has a PhD in plant science. He says 
that his posts working fora local government 
officeon forestry and soil scienceand onagri- 
cultural policy in Nagasaki gave him insights 
into government regulations and priorities. 


Competitive salaries 

In 2013, Gillum and his colleagues found that 
salaries for biosafety professionalsworkingin 
the United States ranged from up toUS$70,000 
per year for entry-level positions to around 
$110,000 for those with 15 years or more 
experience (see ‘Comfort zone’,andD. Gillum 
etal. Appl. Biosaf. 18, 106-115; 2013). In Austria, 
early-career professionals can expect to earn 
asalary of €69,040 (US$75,600) — similar to 
that of a postdoctoral fellow, says Gabriel © 
Riordain, head of scientific support for the Aus- 
trian Academy of Science'sResearch Center for 
Molecular Medicinein Vienna. 


SAFETY TRAITS FIRST 


The mindsets and skills of great biosafety 
professionals. 

Willing to wear personal protective 
equipment. Officers often set the tone for 
safety culture at their institutions. 

+ Enjoys learning. Biosafety professionals 
analyse research and protocols across the 
entire realm of biology, especially leading- 
edge technologies. 

+ Listens closely. Effective professionals 
don’t pass judgement on mistakes, 

but rather offer reassurance and help 
researchers learn from them. 

+ Mechanically inclined, Officers might 
need to assess, and possibly repair, lab 
equipment ranging from airflow vents to 
pipettors. 

+ Cool under pressure. Biosafety officers 
must be able to remain calmin a crisis, 
and to think clearly and quickly to solve a 
problem. 
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Zachary Wilson, assistant biosafety officer 
at the Anschutz Medical Campus of the Uni 
versity of Colorado Denver, appreciates the 
fact that his salary is not tied to grant funding, 
yet heisstill involved in analysingand shaping 
research projects. Hesays he gets tolearn what 
all the biologists on campusare investigating, 
ranging fromwhatmechanisms the tuberculo- 
sis bacterium uses to survive antibiotic treat- 
ment to how autoimmune disorders develop 
in mice with humanized immune systems. “I 
didn’t want to study one protein in one bac- 
terium for my entire life,” he says. 

Many biosafety officers rank their involve- 
mentin research, albeit in a supporting role, 
as one of the key attractions of thejob.“Oneof 
thethingsI'veloved most about thisjob is that 
I'm still involved in and helping the research 
community,” says Andrea Ladd, assistant 
director of theenvironment, healthandsafety 
office at the University of Wisconsin-Madison. 

Rintala, too, enjoys learning about all the 
biology research across her campus. “I see 
every single person working with biological 
materials,” she says, from engineers devel- 
oping microscopy techniques to researchers 
studying pollution in the nearby Great Lakes. 

Peter Farina, director of safety at National 
Jewish Health hospital and research institute 
in Denver, finds that his job sometimes crosses 
over into theclinical realm. One of the institu- 
tion's specialities is respiratory diseases, and 
many researchers work closely with thetuber- 
culosis bacterium. Heoften needs to work with 
the hospital’sinfection-prevention officer to 
make sure that protections are in place for 
researchers, clinicians and patients, and that 
wards and clinicsare disinfected properly after 
procedures. 


Days spiced with variety 

Many researchers find the position appealing 
forits standard working week, with set hours 
and flexibility to work from home on some 
days. Before getting her current role, Ladd 
had spent 1 years as principal investigator 
studying heart development and disease at 


2 
3 
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the Cleveland Clinic’s Lerner Research Insti- 
tute in Ohio, She was spending 40 hours per 
week writing grants and another 30 hours at 
the bench. [twas an unsustainable schedule 
and she knewit had to change. Even though she 
had no experience asa biosafety officer or any 
trainingin microbiology, she was aware of the 
importance of biosafety procedures and was 
comfortable with the administrative side of 
research and how to manage teams and budg- 
ets. So she took a gamble and applied for her 
Jobat the University of Wisconsin-Madison. 
pitched hard in theinterview that! could learn 
any of the science Ineeded to learn,” shesays. 

Many biosafety officers work alongside 
colleagues who oversee thehandling and dis- 
posal of hazardous waste, monitoring of air 
and water supplies, handling of chemicalsand 
radiation and employeehealth and well-being. 
They are also members of committees that 
review and approve research projects involv- 
ing biological materials such as infectious 
agents, recombinant DNA or syntheticnucleic 
acids, as well as committees that oversee the 
care of research animals. 

Reviewingresearch protocols helps officers 
todetermine the biosafety level (BSL) at which 
aresearch team should be working, ranging 
from1 for the least hazardous to 4 forthe most. 
‘That means assessing which experiments must 
bedone ina controlled cabinet, what protec- 
tive gear should be worn and how biohazard- 
‘ous waste should be disposed of. 

Another large part of the job involves 
building relationships with laboratories on 
campus. “If take the time to sitand learn why 
researcherswantto doa particularexperiment, 
itreally helpsme do my job,’ saysSeltzer. “It 
helps me solve the safety problem better, 
which helps them get their research done, and 
I get excited aboutit.” 

Building trusts also crucial for times when 
something goes wrong. The most common 
incidentsareneedle pricks, eye splashes, cuts 
from sharp instruments, bites from research 
animalsand spills. t's very rare fora biosafety 
officer tohaveto suit up in protective coveralls 
and respirators for an emergency. Like fire- 
fighters, however, theyneed tobe prepared for 
such scenarios. And unusual incidents some- 
times require officers to quickly find people on 
campus who can advise them on the science, 
the facility or both, says Farina. 

Thatsaid, itdoesn’thurt to bean “addict to 
adrenaline”, says Ochoa Carrera, who formerly 
worked as a BSL3 coordinator for Mexico's 
Ministry of Health, “absolutely like that part 
ofthejob. You need tobe aware of the poten- 
tial things that can happen inside or outside 
ofthe lab and beready torespond toall types 
of threats.” His mettle was tested in 2013, when 
he led a mobile laboratory team to investi- 
gate a cholera outbreak in the rural region of 
Hidalgo for two months. The situation called 
for both his technical expertise in handling 


David Gillumis a’ 


safety officer at Arizona State University in Tempe. 


pathogens and his leadership skills. “You have 
toknow how to handle the pressure,’ he says. 

Most of the job is about prevention, “It’s 
always been our office’s practice that we 
don’t want to end up on the local news,” Rin- 
talasays. And officerslead annual raining for 
lab workers, including when and how to use 
safety glasses and other protective gear, how 
totransportor pipette biological hazards, and 


“One of thethings I’ve 
loved most about this 
jobisthatl’'mstillinvolved 
inand helping the research 
community.” 


how to use eye-washing stationsand fire extin- 
guishers. They also inspect equipment and 
procedures. Tanaka says that he is continually 
reminding researchers in tropical Okinawa not 
to wear opensandalsin the lab. 

But when incidents do occur, biosafety 
officers must remain calm, listen carefully 
and without judgement to what happened, 
and see to therresearcher's immediate welfare 
(see ‘Safety traits first’). Rintala says that it’s 
helpful to maintaina reassuring, non-critical 
demeanour and make sure that researchers 
don’t feel blamed or shamed for making a 
mistake. “It’s really important not to treat it 
as punishment,’ shesays, “but rather, totreat 
itasa learning experience.” That means that 
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researchers willbe more likely to report prob- 
lems and ask biosafety officers for help the 
next time. Once the situationis under control, 
biosafety officers need to investigate what 
went wrong and discuss how it could be pre- 
vented inthe future. 


Challenges and rewards 

Tanaka says that one of the most rewarding 
and challenging parts of the job is deciding 
how to work safely with new technologies. 
“Themore advanced the researchis, the fewer 
regulations exist to cover it,” he says. In one 
such example, OIST researchers created a 
hybrid virus, combining parts from one that 
causes an animal disease and from another 
that is harmless. No regulations existed for 
handlingsucha hybrid. “The biosafety officer 
onsite needs to [balance] safety measuresina 
way thatalso ensures the freedom of research” 

Biosafety officers describe the job as a 
great career option for researchers needing 
achange of pace butwanting tostay involved 
in research, Although it requires pivoting to 
asupporting role, they find that role very sat- 
isfying. And sometimes, they findthemselves 
in an adrenaline-filled moment, suited up in 
protectivegear. 

“Itmight seem likeascene outofthe movie 
Contagion,’ Wilson says. “But if we don't feel 
comfortable workingin those labs, then we're 
doing something wrong.” 


Kendall Powell is a freelance writer in 
Lafayette, Colorado. 
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AHOMEFOREVERY 
IMAGING DATA SET 


Repositories let researchers store, share and access life-science 
images — and maybe even extract new findings. By Amber Dance 


hen Sjors Scheres set out to 
develop a tool to reverse flaws in 
cryo-electron microscopy images, 
he needed lots of data on which 
to testit. So Scheres, a structural 
biologistatthe MRC Laboratory of Molecular 
Biology (LMB) in Cambridge, UK, turned to 
the Electron Microscopy Public Image Archive 
(EMPIAR), a database of raw images. There he 
downloaded, for free, data collected by the 
lab of Gabriel Lander, astructural biologistat 
Scripps Research in La olla, California, 
Using his new technique, Scheres was able 
to squeeze sharper images from those data, 
improving" the resolution of one structure 
from 3.1 Angstréms to2.3 angstroms, 
“That's precisely why we posted the data,” 
says Lander. “We knew some brilliant peo 
ple out there would be able to improve on 
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our processing.” 

Services such as EMPIAR give researchers 
acentral locationin which to store, share and 
access a rapidly expanding corpus of biolog- 
ical images. “The data aren’tjust one picture 
any more,’ says Joshua Vogelstein, a neuro- 
statistician at Johns Hopkins University in 
Baltimore, Maryland. Movies, 3D images and 
microscope-based screening data cantake up 
gigabytes or terabytes of storage, and can’t 
be e-mailed backand forth in the same wayas 
individual TIFF or JPEG files. Moreover, grant 
agencies and journals increasingly require 
scientists to make their data available to 
all, but don’t necessarily offer to host them. 
EMPIARanditskin ill that gap, and often pro- 
videa digital object identifier or other citation 
so researchers can get credit for their data. 

“Are you struggling to load your images?” 
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asks Forrest Collman, a neuroscientist at the 
Allen Institute for Brain Science in Seattle, 
Washington. “Are you particularly struggling 
toshare?"Ifso, hesays, “lookingintothiskind 
of service makes sense for you’. 

1n2019, when Collman spotted an odd-look- 
ing neuron in one of hiselectron-microscopy 
data sets’, it was easy for him to senda col 
leaguealink to thatspotin the data repository, 
rather than a bulky file. She noticed another 
unique feature, and Collman identified a few 
similar cells. They might turn out to be anew 
type of neuron, Collman says. 

There area number of other image ware 
houses available, among them theImage Data 
Resource (IDR). Bothitand EMPIARare hosted 
by the European Molecular Biology Laborato- 
ry’s European Bioinformatics Institute (EMBL- 
EBI) in Hinxton, UK. Further options include, 


butare not limited to, NeuroData, a platform 
that Vogelstein set up to host neuroanatomy 
files, and the Systems Science of Biological 
Dynamics (SSBD) database at Japan's RIKENnet- 
work of research institutes. Advocates expect 
these platforms to follow the model of estab- 
lished DNA- and protein-sequence resources 
suchas GenBank and the Protein Data Bank, 
which have powered an array of analyses and 
spawned the field of bioinformatics. 

“We're very early days,’ says Jason Swedlow, 
a quantitative cell biologist at the University of 
Dundee, UK, Butheexpectsbig benefits, both 
for scientists who download large image sets 
to feed data-hungry machine-learning algo- 
rithms and for those who might make new 
discoveriesin others’ data. 


Share and sharealike 


Iewasadata-hungry scientific community that 
drove Kate McDole,a developmental biologist 
at the LMB, to use an image database. 

McDole, then working at the Howard 
Hughes Medical Institute's Janelia Research 
Campus in Ashburn, Virginia, had imaged 
mouse embryos every five minutes as they 
developed, yielding terabytes of data anda 
high-resolution developmental atlas’ thathas 
generated significantinterest.“Peoplearefor- 
everasking me, did youlookat this tissue, did 
youlook at that tissue?” Soshe looked faraway 
toshareall those terabytes. 

The journal offered only gigabytes of space, 
much less than McDole needed. (“Oh, giga- 
bytes,” shescoffs, “gigabytesarecute.") Soshe 
uploaded the atlas to the IDR, a free service 
developed by Swedlow and his colleagues. 
The data transfer took the better part of a 
week, she says. But now, anyone with a web 
browser can scroll through her data set, find 
their favourite tissues, or compare their results 
with hers. McDole herself often uses the IDR 
at conferences, to show colleagues data she 
doesn't carry onher laptop. 

Such databases offer more than a storage 
location, saysJan Ellenberg, acell and molecu- 
lar biologistatEMBL in Heidelberg, Germany, 
and researchers shouldn't simply drop their 
data sets into small, project-specificarchives 
or generic cloud storage. “Just dumping the 
data somewhere doesn’tmean people can use 
it,’ Ellenberg explains. “You need to organize 
the data, you need to annotate it, and curate 
it” Browsers of McDole’s dataset, forinstance, 
can scan themetadatato find out information 
such as the strain of mice she used and the 
specific fluorescent labels she imaged. 

Patrick Combes, global technical leader 
for health care and life sciences at Amazon 
Web Services in Seattle, agrees. “Storing a 
data set on Amazon doesn’t automatically 
enhance it”’ he says. But if scientists han- 
dle processing, curation and annotation, 
Amazon can bea secure, reliable data host, 
he says. Italready houses several widely used 


resources, including raw data from the Allen 
Brain Observatory and NeuroData. 
Researchers can typically upload their 
data to life-science image databases at no 
cost, because storage, curation and main- 
tenance are often funded by grants or other 
benefactors. Shuichi Onami, a developmental 
biologist at the RIKEN Center for Biosystems 
Dynamics Research in Kobe who founded the 
SSBD database, obtained funding from insti- 
tutions including RIKEN; the Japan Scienceand 
Technology Agency; and the nation’s Ministry 


“We do all ofthe boring 
infrastructureto 

make sure those data 
persist” 


of Education, Culture, Sports, Science and 
Technology. The database is “completely free” 
totheuser, says Onami, Nowheis expanding it 
beyond developmental biology, toincludeany 
biological data setthat contains spatiotempo- 
ralinformation, aswell as static images taken 
with state-of-the-art technologies. 

Iv’salso generally freeto download datasets, 
and often to reuseand republish them: repos- 
itories frequently use Creative Commons 
licences that make availability transparent. 


Pickand choose 
Databases differ in the sizes of files they will 
accept, whetherimagesmustbe linked toa pub- 
lished study, and their research focus. If your 
scientificcommunityalready hasa specialized 
datahouse, Vogelstein recommendsusing that. 

Butthere are general repositories. Figshare, 
for instance, accepts any kind of data, up to 
S gigabytes per file, for free. Itcan sometimes 
raise the limit, says founder Mark Hahnel — 
thebiggest Figshare data setmeasures intera- 
bytes. (Figshare is owned by Digital Science,a 
firm operated by the Holtzbrinck Publishing 
Group, which hasa share in Nature's publisher, 
Springer Nature.) Other free, catch-all type 
servicesinclude Zenodo and Dryad. 

Figshare also has contracts with universities, 
funders and publishers (including Springer 
Nature), which pay an annual fee for extra 
benefits. Last year, it set up a repository of 
data from research funded by the US National 
Institutes of Health (NIH), whichnow expects 
itsgrantrecipientsto make their results freely 
available. The site ismeant for datathat don’t 
fitneatly into subject-specific banks, and cur- 
rently hosts dozens of data sets. Unlike with 
the standard Figshare service, the NIH has 
control over the repository: what kinds of 
content are allowed, for example, and what 
kinds of metadata are required. NIH grantees 
benefit fromassistance with metadata for their 
submissions, among other features. 

The IDR databases — there is one for images 
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of cellsand onefor tissues — are tightly curated, 
says Swedlow. Heand the other curators seek 
reference data sets linked to publications, 
such as results from large screening studies 
that would be of use toa wide audience. They 
ensure that the data are properly formatted 
and annotated with relevant metadata, such 
as information on the microscope used and 
experimental treatments applied. 

Last July, EMBL-EBI announced a service 
called the Biolmage Archive, which will host 
both the IDR and EMPIAR, as well as the more 
general BioStudies database. The institute will 
support further curated, community-specific 
databases in future, says Jo McEntyre, asso- 
ciate director for services at EMBL-EBI. With 
support from EMBL and the funding agency 
UK Research and Innovation, the Biolmage 
Archive will be maintained for “as long asit’s 
scientifically useful’, she promises. Figshare, 
Hahnel says, “will persist forever” ~ although 
he admits the contract guarantees only a 
decade. “We do all of the boring infrastruc- 
ture to make sure those data persist,” he says. 


Other people's data 

‘These services make it easier to find, shareand 
store big datasets. Butas with DNAand protein 
databases, the hopeis that image-surfers will 
find new science in others’ data. 

Demonstrating this potential, Swedlowand 
his colleagues combed images from three 
separate studies of cell elongation in the 
IDR. Two were from the human cancer-cell 
line HeLa; one was in fission yeast; all three 
imaged cells missing a variety of genes. 
“Each study gets different results, but they're 
related,’ says Swedlow. Together, thesestudies 
allowed himand his team to identify a larger, 
more complete network of genes involved in 
elongation than they could get fromany one 
data setalone’. 

Astudy* posted on the arXiv preprintserver 
last year reports that of almost 532,000 jour- 
nal articles published by PLOS and BioMed- 
Central, those that linked toa data repository 
had up toa 25% higher citation impact than 
those that didn't. 

With time to mature, image databases 
could yield more than just one-off discoveries, 
Swedlow says. After all, bioinformaticsitself 
grew out of DNAarchives. 

“Hopefully” he says, “weend up stimulating 
the developmentof whole fields.” 


Amber Dances a freelance science journalist 
near Los Angeles, California. 
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sa NASAengineer, Ihave the coolest 
job: I play with robots in a‘sandbox. 
Anythinga lander or rover on Mars 
does, we first testhere on Earth in 
our sandbox filled with crushed 
garnet. We need a dust-free space, and 
garnets so hard that it creates no dust. We 
use bright lights to mimic the sunlight on 
Mars, so pictures taken here are comparable 
to those captured on thered planet. 

For the past few years, I've been working, 
on the international InSight mission, which 
is studying the interior structure of Mars. 
The planet doesn’t seem to have tectonic 
plates, so the surface that the landersare 
digging into has the same composition that 
ithad one million years ago. 

Before the mission launched on 5 May 
2018, we spent about one year testinga full- 
scale model, shown behind me. Weneeded 
to ensure that the lander would be able to 
use all its equipment no matter what angle it 
landed at relative to the surface. We tested 
all possible angles. On26 November 2018, 
InSight landed ata perfect two-degreetilt. 

We needed to troubleshoot how Insight 
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deploysits equipment, sowe set up the 
sandbox like the landing site using a process 
called ‘marsforming’. We used augmented- 
reality headsets, loaded witha digital map 
based on pictures taken by thelander. Ifyou 
look down with one on, you are standing on 
‘Mars’. That brought tears to my eyes. 

Thelander’s‘mole’,a heat probe that is 
supposed to hammer five metres into the 
surface, is malfunctioning, so we needed to 
deepen the sandbox to finda fix. We raised 
the lander model on blocks and added an 
extra crate of garnet to make asmall, deep 
space in which I could positiona backup heat 
probe, asI'mshown doing, totest solutions 
to get the mole on Mars underground. 

Asa five-year-old, lwanted to become the 
first astronaut to walk on Mars. Since 2008, 
I've been applying for the astronaut corps. 
For now, have the next-best thing. 


Marleen Martinez Sundgaard is the lead 
systems test-bed engineer for the InSight and 
Psyche missions at the NASA Jet Propulsion 
Laboratory in Pasadena, California. Interview 
by Amber Dance. 
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How should self-driving cars make decisions when human lives hang 
inthe balance? The Moral Machine experiment! (MME) suggests that 
people wantautonomous vehicles (AVs) to treat differenthumanlives 
unequally, preferentially killing some people (for example, men, the 
old and the poor) over others (for example, women, the youngand the 
rich), Our results challenge this idea, revealing that this apparent pref- 
erence for inequality is driven by the specific ‘trolley-type’ paradigm 
used by the MME. Multiple studies with a revised paradigm reveal that 
people overwhelmingly want autonomous vehicles to treat different 
human lives equally in life and death situations, ignoring gender, age 
and status~a preference consistent witha general desire for equality” *. 

The large-scale adoption of autonomous vehicles raises ethical chal- 
lenges because autonomous vehicles may sometimes have to decide 
between killing one person or another**. The MME seeks to reveal 
people's preferences in these situations and many of these revealed 
preferences, suchas‘save more people over fewer’ and'kill by inaction 
over action’ are consistent with preferences documented in previous 
research’, 

However, the MME also concludes that people wantautonomous vehi- 
cles tomake decisions about whotokill onthe basisof personal features, 
including physical fitness, age, status and gender (for example, saving 
womenandkilling men). This conclusion contradicts well-documented 
ethical preferences for equal treatment across demographic features 
and identities, a preference enshrined inthe US Constitution, the United 
Nations Universal Declaration of Human Rightsandinthe Ethical Guide- 
line9 ofthe German Ethics Code for Automated and Connected Driving’. 

‘We suggest that the MME finds preferences for inequality across 
lives because its methodology is relatively insensitive to preferences 
for equality. The MME uses trolley-type dilemmas that force people 
to choose between killing one person (or set of people) versus killing 
another person (or set of people). Because this paradigm assumes 
inequality (for example, should we program AVsto killmen orwomen?), 
ithas difficulties revealing whether people prefer equality (for example, 
should we program AVs to ignore gender?). 

What would happen if people indicated their ethical preferences 
inarevised paradigm, one that allowed AVs to treat different humans 
equally? We explored this possibility in study 1, in which people were 
randomly assigned to either a forced inequality’ or an ‘equality allowed’ 
condition. Participants were drawn from two quasi-representative 
samples across two Western countries (US, N=1,174; UK, N= 1178). 

The forced inequality condition was a simplified replication of the 
MME, testing whether participants thought autonomous vehicles 
should (1) kill group A (for example, elderly people) to save group B 
(for example, children) or (2) kill group B to save group A. Asin the 
MME, we examined both personal features (for example, kill men versus 


women) and structural features (For example, kill many people versus 
few people) in driving situations. However, unlike the MME—which 
used composite groups that simultaneously varied both personal and 
structural features—we examined each of these features individually 
(see Supplementary Information and https://osf.io/wy8tq/?view_only= 
€5907552f5e4a8a9Olcbdd2d4.c035f6 for details and data). 

AsFig.1shows, results from the forced inequality condition closely 
match the global effects of the MME. Beyond the general value of rep- 
lication”, this validates our paradigm: although we used a different 
sample andasimpler method, we obtained the same resultsasthe MME. 

The equality allowed condition was similar to the forced inequal- 
ity condition, but with the addition ofa third option, (3) treat theives 
of groups AandB equally (for example, treat the lives of children and 
elderly peopleequally). As Fig.1 shows, people overwhelmingly selected 
this option when it was available, revealing that they wantautonomous 
vehicles to treat people equally. For example, when forced to choose 
between men and women, 87.7% chose to save women, but 97.9% of 
people actually preferred to treat both groups equally. See Supple- 
mentary Table 1 for full results. 

Admittedly, it may be difficult to programa deep sense of egalitari- 
anisminto machines, but autonomous vehicles can functionally value 
human lives equally by simply ignoring (or failing to detect) features 
suchas gender, age and social class. Restricting the ethical choice set 
of autonomous vehicles is consistent with emerging research reveal- 
ing that people prefer autonomous machines not to make important 
ethical decisions", Ignoring personal featuresis also more consistent 
with the currenttechnical capacities of AVS. 

One question about our data is whether participants prefer the'treat 
equally’ option simply becauseit failsto mention killing. Study 2ruled 
outthis concern by replicating the equality allowed condition (N= 843 
US participants from an online panel) with a modified third optior 
that autonomous vehicles should decide who to save and who to kill 
without considering their personal features. Consistent with study 1, 
people expressed robust preference for AVs to treat people equally 
by ignoring personal features. For example, people preferred self- 
driving cars to not consider gender (92.6%), fitness (88.8%) or status 
(84.7%). The only substantial departure from study 1 was lawfulness: 
53.1% of people preferred to spare law abiders over law breakers. See 
Supplementary Table2 for full results. 

Of course, AVs might sometimes have to choose between killing 
different sets of people, but these decisions can rely solely on struc- 
tural rather than personal features. Instudy 3, participants (N=993US 
participants from an online panel) chose which of two autonomous 
vehiclesshould be allowed on the road: onethat makes ethical decisions 
onthebasis of the structural features revealed by the MME (for example, 
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Fig.1|People’schoicesfor how autonomous vehicles should be 
programmed toactinsituations where human livesareatstake (study 1). 
Personal features reflectindividual identity characteristics (for example, age 
and status) and structural features reflect characteristics of the situation. The 
forced inequality condition (n=1,129) replicates the MME, which makes people 
choose between two options, whereasthe equality allowed condition 
(n=1,223) providesa third option of equal treatment. See Supplementary Fig.1 
for confidence intervals. 


saving more people versus fewer, killing by inaction versus action), 
and another on the basis of both structural and personal features (for 
example, saving people based on age, gender, and status). Consist- 
ent with our predictions, 89.9% of participants chose the structural- 
features-only car, once again expressing a desire for AVs that ignore 
personal features in ethical dilemmas. 

Wenotea number of caveatsto our studies. Oursamplesweresmaller 
than the millions who completed the MME. However, using quasi-repre- 
sentative samplesin our main study (rather than aconveniencesample) 
helps generalize the results to the populations of two large Western 
countries, We acknowledge that ethical preferences may vary across 
cultures, but our key points that the current MME paradigms relatively 
insensitiveto preferencesfor equality, regardless of participant culture. 
Finally, we recognize that people often do discriminate on the basis of 
personal features, as sexism, classism, racismand ageismall illustrate. 
However, even people who implicitly act to perpetuateinequality often 
explicitly espouse ideas of equality”. 

To frame the MME in a broader context, considera thought experi- 
‘mentabout some personal features notassessed by the MME-religion, 
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race, and disability. What might happen if the MME forced people to 
choose between black and white people? Aggregating people's deci- 
sions could reveal a racial bias", but this would not mean that people 
want to share the road with racistautonomous vehicles. Thesame logic 
applies to the features that wereincludedin the MME. Do people truly 
wantto livein aworld with sexist, ageistand classist self-driving cars? 
This thought experiment further suggests that aggregating across 
forced-choice preferencesmay notaccurately reveal how people want 
autonomous vehicles to be programmed to act when human livesare 
atstake. 

Although we must be careful about interpreting the results of the 
MME, we emphasize its value. Every methodology has limitations, 
and the MME reveals both basic moral cognitive processes and global 
preferences for saving livesina forced-choice paradigm. More broadly, 
the MME highlights the important ethical questions posed by AVs— 
questions that society will soon need to address. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Dataavailability 


All materials, data and code usedin the studies are available at https:// 
osf.io/wy8tq/?view_only=e5907F552f5e4a8a901cbdd2d4c035f6. 
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Study description 


Research sample 


Sampling strategy 


Data collection 


Timing 


Data exclusions 


Non-participation 


Randomization 


All three reported studies were quantitative, 


Study 1 
‘Three thousand and three people were recruited by Prolific in two nationally representative samples (on age, gender and ethnicity) one 
from the UK and one from the USA. After a few days of data collection the age criteria was loosened, such that older ages are sila litle 
under-represented (eg, for people older than $8 years old, 328 instead of 467 inthe US sample and 446 instead of 463 in the UK 
sample) 

in the UK representative sample (N = 1503), 772 were female and 731 male, 271 partiipants were between the ages of 18 and 27, 263 
between 28 and 37, 282 between 38 and 47, 240 between 48 and 57, and 486 participants older than 58, One hundred and fifteen 
participants were Asian, 55 black, 31 mixed, 24 other and 1278 were white. Four of the responses were empty, such that the final sample 
size was 1498. 

In the US representative sample (N = 1500) 769 were female and 731 male, 339 participants were between the ages of 18 and 27, 327 
between 28 and 37, 25Bbetween 38 and 47, 248 between 48 and 57, and 328 participants older than 58. 96 were Asian, 197 black, 37 
mixed, 30 ather and 1140 were white 


Study 2: 
‘One thousand and four people were recruited vie Amazon's Mechanical Turk (429 male, 566 female, 9 other/preferred nat to disclose; 
‘Age: M = 35.00, SD = 12.22), 


Study 3: 
One thousand and nine people were recruited vie Amazon's Mechanical Turk (433 male, 570 female, 6 other/preferred not to disclose; 
Age: M = 37.26, SD = 12.80) 


Study 1 used stratified sampling. Studies 2-3 used convenience samples, 
Sample size: We wanted to far exceed typical power recomimendations, and given that isolating the true proportion of the population is 
important, believed 2000 participants would keep the standard errar of the mean sufficiently law for our main study (Study 2), and 1000 
for the additional Studies (Studies 2 and 3), 


Data was collected on Qualtrics XM through online panels such as Prolific (Study 1) and Amazon's Mechanical Turk (Studles 2-3). 


Study 1: The UK sample was collected between April 18th and April 23rd 2019. The US sample was collected between April 24th and April 
30th, Data for Study 2 was collected on July 18th 2019. Data for Study 3 was collected on April 15th 2019, 


All exclusions were per-registered (links to the per-registration appear in the Supplemental information}. 


Study 1 
Participants completed three attention checks. In the first attention check they were asked what day was yesterday and what they asked 
for breakfast. In the second attention check participants were shawn three sliders, marked X, Y and Z. They were asked to set Xon 15, ¥ 
to be greater than X and evenly divisible by 10, and 2 to be larger than Y. In the third attention check participants were asked if they 
‘answered questions about how a self-driving car or a human driver, and if they had an option of having people treated equally. Six 
hundred and forty seven participants falled at least one of the attention checks and were excluded from the analysis as planned in the 
pre-registration. 


Study 2: 
Participants completed two attention checks, n the first attention check they were asked what day was yesterday and what they asked 
for breakfast. In the second attention check participants were asked if they answered questions abaut how a self-driving car ara human 
driver, and if they had an option of having people treated equally. One hundred and fifty seven participants failed at least one of the 
attention checks and were excluded from the analysis as planned in the pre-registration. 


Study 3: 
Participants were asked what day was yesterday and what they asked for breakfast. Sixteen participants failed this attention check and 
‘were excluded from the analysis, 


NA 


Randomization was done with the "randomize" function in Qualtrics. 
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In‘The Moral Machine experiment’ (MME)!, we argued that poli- 
cymakers would benefit from being aware of citizens’ preferences 
regarding the behaviour of autonomous vehicles in critical situations 
situations in which an autonomous vehicle cannot save everyone, 
but can still decide to save one group of road users or another. In the 
accompanying Comment’, Bigmanand Gray maketheimportant point 
that the way we measure these preferences can affect the results we 
obtain, 

Actual consumer choices cannot yet be recorded. If we want the 
ethics of these vehicles to be decided before they hit the market, we 
can only collect stated preferences, based on hypothetical choices. 
TheMME used a standard method for collecting stated preferences 
between multidimensional outcomes: Users chose between pairs of 
unavoidable accidents—which varied along multiple dimensions— 
and the importance of each dimension was statistically extracted 
from their choices using conjointanalysis’. Typical surveys can only 
do this for a few dimensions, because of the exponential increase 
in required sample size for every additional dimension. Given the 
unusual scale of the MME, we were able to investigate nine dimen- 
sions simultaneously. 

Bigman and Gray adopted a different method. Rather than having 
users go through multiple pairs of nine-dimensional outcomes, they 
asked eight separate questions about general policy preferences, one 
per dimension (the human-nonhuman dimension was not used in 
their survey). For example, they asked: should self-driving cars be pro- 
grammed to (1) kill children and save elderly people, (2) kill elderly 
people and save children, or (3) treat the lives of children and elderly 
people equally? 

Bigman and Gray report that for all but one question—savingmany 
versus few-the most frequent response was (3). For example, about 
80% of participants said that self-driving cars should ‘treat the lives of 
children and elderly people equally’. 

These results roughly agree with the Moral Machine results on 
some dimensions (for example, the weak preference for inaction), 
and disagree on others (for example, the preference for saving 
children), but the differences between the two methods, measures 
and statistical analyses make any direct comparison difficult. The 
two different methods may differently tap a single, stable set of 
preferences or they may elicit from respondents different facets 
of fragmented, inconsistent preferences that have yet to be solidi- 
fied. Each approach comes with its own limitations, and its own 
usefulness. The Moral Machine approach allows us to measure 
the weight of different moral priorities when pitted against each 


other, rather than considered in isolation; but participants cannot 
explicitly state that one dimension (for example, age) should notbe 
taken into account. Of course, since each scenario involved at least 
two moral dimensions, respondents could avoid making decisions 
based on dimensions they felt should not be programmed into the 
cars. Participants who believed that the vehicle should be blind to 
age, for instance, could endeavour to be systematically blind to age 
themselves in how they responded to the scenario pairs. Had mil- 
lions of participants made this choice, this would have statistically 
resulted in an absence of a preference for age, and it would have 
ranked at the bottom of the list of the nine moral dimensions we 
tested. It remains, however, that individuals had no opportunity to 
explicitly express this preference for equality. 

The approach used by Bigman and Gray does offer participants 
the opportunity to explicitly express a preference for equality. One 
limitation of this approach is that measurement becomes sensitive 
to social desirability, experimental demands and framing effects 
(which is not to say that other methods do not have this problem). 
For example, consider the phrasing of the three response options 
above, and note how the word ‘kill’ disappears from the third option, 
making it instantly more attractive at a surface level. The first two 
options clearly describe trade-offs, whereas the third option only has 
positive connotations. We could suggest an opposite framing for the 
third option: the self-driving car should indiscriminately kill children 
and elderly people’. This iss valid a description as the one used by 
Bigman and Gray, butit seemsless attractive in this negative framing. 
Indeed, in their study 2, Bigman and Gray used a framing that stands 
somewhere in between the positive framing used in study 1 and the 
negative framing we suggest above, and this intermediate framing 
appeared to have an effect on the results: for half of the questions, 
the frequency of the ‘equality’ response decreased by 16 percentage 
points to 27% (as canbe seen by comparing their Supplementary Table 
land Supplementary Table 2). 

We should note that an unpublished portion of the MME used a 
third method-one similar to that of Bigman and Gray, but one that 
avoided this loaded language confounder. After making 13 decisions, 
usershad the option to ‘help us better understand (their) decisions. 
Users who agreed were taken to a page where they could position 
one slider for each of the nine dimensions explored by the Moral 
Machine. For example, one slider showed a baby on the left side, an 
elderly person on the right side, and was labelled ‘Age preference’ 
Users could move the slider to express how important this dimen- 
sion should be—more to the leftifthey wanted to save younger lives, 
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Fig.1) Distribution of explicit preferencesstated by Moral Machineusers. 
Sliders were presented witha default position determined by the responses 
users gave to the Moral Machine ‘judge’ mode. a, Preferences of users who 
moved at least one slider from its original position (585,531users; >99% of the 
users). b, Preferences of users who changed sliders from their original position 
(range:190,862-581,496 users). In both cases, only row S (social status 
preference) shows aclear gap between the preferences extracted fromthe 
Moral Machine'and the preferences explicitly expressed by users. 


more to the rightif they wanted to save older lives. Importantly, this 
method did give participants the option to treat the lives of children 
or elderly people (or men or women, or humans or pets) equally; 
participants could easily express such a preference by positioning 
theslider atthe midpoint of the scale. Thisis, in essence, the method 
used by Bigman and Gray—except that it uses a continuousmeasure 
rather than a three-pointscale and doesnotusea textual description 
for the midpoint of the scale. 

The original position of the sliders was not systematically the mid- 
dle point of the scale, but rather a rough estimation of the prefer- 
ence of each individual user based on their responses to the Moral 
Machine. Thus, users had the opportunity to move sliders if they 
disagreed with the estimation. More than 99% of users who saw the 
slider page moved at least one slider from its original position. Fig- 
ure 1a shows the final position of all sliders for these $85,531 users, 
thus reflecting their choices when given the option of explicitly 
valuing all lives equally. Figure 1b shows the final position of each 
slider only for those users who actually moved it. This isa stronger 
test, since itrestricts the data to the responses of users who actively 
expressed a preference. 

Both figures ella similar, three-partstory. Atthe top of each figure, 
we can see that four preferences that were estimated as strong in the 
MME (saving humans, saving more lives, saving younger lives and 
saving pedestrians who cross legally; Fig. 2) are confirmedas strong. 
For these four dimensions, the distributions of responses are clearly 
skewed, and themodal response isnot equality. At the bottom of each 
figure, four preferences that were identified as weak inthe MME (inac- 
tion, saving pedestrians, saving fit characters and saving women) 
are confirmed as weak. The modal response for these dimensions is 
indeed equality. 

Only for one dimension do we find a clear gap between the prefer- 
ences extracted from the Moral Machine and the preferences explicitly 
expressed by users. Whereas users’ scenario-based choicesindicateda 
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Fig.2| Preferences extracted from the conjoint analysis of the Moral 
Machine dataset. This figure isa simplified version of Fig. 2a from the MME". 
The-xaxis shows the average marginal causal effect for each preference.Ineach 
row, APristhe difference between the probability of sparing characters 
possessing the attribute on the right, and the probability of sparing characters 
possessing the attribute on the left, aggregated over all other attributes 
(n=35.210"). 


preference for saving high-status characters over low-status characters, 
their expressed preference on the sliders was to treat them equally. 
Here we see the value of giving people the opportunity to express an 
explicit preference: While their scenario-based choices may well show 
an implicit bias against lower-status victims, the users would probably 
beunhappyifthis bias was actually acted on. Of course, itis extremely 
unlikely that policymakers would propose that autonomous vehicles 
should discriminate on thebasis of social status, but we canstill remain 
vigilant for other gaps between implicitbiases and explicit preferences 
for equality, whenever they concern characteristics that may enter 
policy debates. 

Self-driving car fatalities are an inevitability, but the type of fatali- 
ties that ethically offend the publicand derail theindustry are not. As 
aresult, itseems important to anticipate, asaccuratelyaswecan, how 
the publicwill actually feel about the ethical decisions we programinto 
these vehicles. Since any method used to collect these preferences will 
have its own biases and limitations, the methodological diversity advo- 
cated by Bigmanand Gray, and the broad involvementof psychologists 
more generally, will be critical to reaching that goal. 


Methods 

Ethical compliance 

This study wasapproved by the Institute Review Board at Massachusetts 
Institute of Technology. The authors complied withall relevant ethical 
considerations. Participants were briefed on the purpose of the study 
and were given the chance to optout from having their data used. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 
Dataand code that can be used to reproduce Figs.1 and 2are available 
at https://bit.ly/2VKyMh}. 


‘Awad, E. etl. The Moral Machine experiment. Nature 663, 59-64 (2018), 

2. Bigman, ¥ E. & Gray, KLife and death dacisions of autonomous vehicles. Nature 
https: /do\.org/101038/641586-020-1987.4 (2020). 

3. Hainmueller, 1, Hopkins, DJ. & Yamamoto, T.Causalinference in conjoint analysis: 
Understanding multidimensional choices via stated preference experiments. Political 

Anal. 22, 1-30 (2014), 


‘Acknowledgements J-F8, acknowledges support from the ANR Labex Insttute for Advanced 
‘Study in Toulouse, the ANRIA Artificial and Natural Intelligence Toulouse Institute, and the 
‘rant ANR-17-EURE-0O10 Investissements vent. LR. acknowledges funding from the Ethics 
& Governance of Artificial intelligence Fund. 


‘Author contributions1.R. AS. and 1-8. planned the research. LR, AS. J-FB.,£.A.and SD. 
<esigned the experiment. E.A. and .0. built the platform and collected the data. EA. $.D., 
RK, JS.and AS. analysed the data. A, SD,RK, JS. 1H. AS, J-FB.andIR interpreted the 
results and wrote the paper. 


‘Competing interests The authors declare ne competing interests. 


‘Addlitional information 
‘Supplementary information is availabe for this paper at https/dol.org/10.1038)41586.020. 
1988-3, 

Correspondence and requests for materials should be addressed to A'S, 1-FB. or 
Reprints and permissions information is available at http:/www.nature.com/reprint. 


(© The Author(s), under exclusive licence to Springer Nature Limited 2020 


Nature | Vol 579 | 5March 2020 | ES 


natureresearch Panne here 


Last updated by author(s): May 23, 2019 


Reporting Summary 


Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


[X] The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality anc adjustment for multiple comparisons 


Pa Ail description of he statistical parameters including central tendency (e.g, means] or other basic estimates (e.g, regression coefficient) 
AND variation (e.g, standard deviation) or associated estimates of uncertainty (e.g, confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t,r) with confidence intervals, effect sizes, degrees of freedom anc P value noted 
Give P values as exact values whenever suitable 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


XI Fer hierarchical nd complex designs, identification of the appropriate level for tests and full reporting of outcomes 


[RI] Estimates of effect sizes (2.8. Cohen's d, Pearson's r), indicating how they were calculated 


Ourweb 


lection on statistics for biclagists contains artiles on many 


Software and code 


Policy information about availability of computer code 
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